I have a string that contains html. Inside of this string there is an html tag and I want to retrieve the inner text of that. How can I do that in C#?
Here is the html tag whose inner text I want to retrieve:
<td width="100%" class="container">
I have a string that contains html. Inside of this string there is an html tag and I want to retrieve the inner text of that. How can I do that in C#?
Here is the html tag whose inner text I want to retrieve:
<td width="100%" class="container">
Use the Html Agility Pack.
Edit something like this (not tested)
HtmlDocument doc = new HtmlDocument();
string html = /* whatever */;
doc.LoadHtml(html);
foreach(HtmlNode td in doc.DocumentElement.SelectNodes("//td[@class='container']")
{
string text = td.InnerText;
// do whatever with text
}
You can also select the text directly with a different XPath selector.
Related questions:
try with regex.
public string GetInnerTextFromHtml(string htmlText)
{
//Match any Html tag (opening or closing tags)
// followed by any successive whitespaces
//consider the Html text as a single line
Regex regex = new Regex("(<.*?>\\s*)+", RegexOptions.Singleline);
// replace all html tags (and consequtive whitespaces) by spaces
// trim the first and last space
string resultText = regex.Replace(htmlText, " ").Trim();
return resultText;
}