Retrieving Inner Text of Html Tag C#

Question

I have a string that contains html. Inside of this string there is an html tag and I want to retrieve the inner text of that. How can I do that in C#?

Here is the html tag whose inner text I want to retrieve:

<td width="100%" class="container">

i prefer the solution here http://stackoverflow.com/questions/785715/how-can-i-strip-html-tags-from-a-string-in-asp-net — bresleveloper, Sep 17 '14 at 11:32

score 5 · Accepted Answer · edited May 23 '17 at 12:33

5

Use the Html Agility Pack.

Edit something like this (not tested)

HtmlDocument doc = new HtmlDocument();
string html = /* whatever */;
doc.LoadHtml(html);
foreach(HtmlNode td in doc.DocumentElement.SelectNodes("//td[@class='container']")
{
    string text = td.InnerText;
    // do whatever with text
}

You can also select the text directly with a different XPath selector.

Related questions:

edited May 23 '17 at 12:33

Community

1
1

answered Aug 30 '11 at 20:21

Matt Ball

354,903
100
647
710

Can you load an html document from a string that contains the html in it? or do I have to give it a path? – John Dougherty Aug 30 '11 at 21:06
2

Answered my own question: instead of using Load use LoadHtml intead. Thank you again! – John Dougherty Aug 30 '11 at 21:09

score 0 · Answer 2 · answered Jun 22 '23 at 15:37

try with regex.

public string GetInnerTextFromHtml(string htmlText)
{
    //Match any Html tag (opening or closing tags) 
    // followed by any successive whitespaces
    //consider the Html text as a single line

    Regex regex = new Regex("(<.*?>\\s*)+", RegexOptions.Singleline);
    
    // replace all html tags (and consequtive whitespaces) by spaces
    // trim the first and last space

    string resultText = regex.Replace(htmlText, " ").Trim();

    return resultText;
}

Retrieving Inner Text of Html Tag C#

2 Answers2

Linked

Related