3

I have a string that contains html. Inside of this string there is an html tag and I want to retrieve the inner text of that. How can I do that in C#?

Here is the html tag whose inner text I want to retrieve:

<td width="100%" class="container">
Matt Ball
  • 354,903
  • 100
  • 647
  • 710
John Dougherty
  • 43
  • 1
  • 2
  • 4

2 Answers2

5

Use the Html Agility Pack.


Edit something like this (not tested)

HtmlDocument doc = new HtmlDocument();
string html = /* whatever */;
doc.LoadHtml(html);
foreach(HtmlNode td in doc.DocumentElement.SelectNodes("//td[@class='container']")
{
    string text = td.InnerText;
    // do whatever with text
}

You can also select the text directly with a different XPath selector.


Related questions:

Community
  • 1
  • 1
Matt Ball
  • 354,903
  • 100
  • 647
  • 710
0

try with regex.

public string GetInnerTextFromHtml(string htmlText)
{
    //Match any Html tag (opening or closing tags) 
    // followed by any successive whitespaces
    //consider the Html text as a single line

    Regex regex = new Regex("(<.*?>\\s*)+", RegexOptions.Singleline);
    
    // replace all html tags (and consequtive whitespaces) by spaces
    // trim the first and last space

    string resultText = regex.Replace(htmlText, " ").Trim();

    return resultText;
}
AminRostami
  • 2,585
  • 3
  • 29
  • 45