Parsing HTML string usiing C#

Question

I have a string with html text as shown below.

string htmlText = "<h1>This is heading 1</h1><p>This is some text.</p>
<hr><h2>This is heading 2</h2><p>This is some other text.</p><hr>";

Can we convert this html string as we see it in browser after it has been parsed so that later we can use this parsed string where ever required.

Later I want to copy this data to a sharepoint list multiline rich text column. There I dont need these tags to come, but

what exactly do you want to see in the parsed text? what do you mean "as we see it in browser" ? — Greg Oks, Mar 01 '17 at 08:39
Possible duplicate of [Grab all text from html with Html Agility Pack](http://stackoverflow.com/questions/4182594/grab-all-text-from-html-with-html-agility-pack) — ta.speot.is, Mar 01 '17 at 11:54

score 1 · Answer 1 · edited May 23 '17 at 12:25

This answer provides an example using HtmlAgilityPack, which is much more robust than rolling your own parsing or regular expressions.

XPATH is your friend :)

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(@"<html><body><p>foo <a href='http://www.example.com'>bar</a> baz</p></body></html>");

foreach(HtmlNode node in doc.DocumentNode.SelectNodes("//text()"))
{
    Console.WriteLine("text=" + node.InnerText);
}

score 1 · Answer 2 · answered Nov 29 '18 at 21:55

Your question isn't entirely clear and cuts off at the end. But you can actually parse the data if you want. Just examine each character to find the tags using string indexes (e.g. htmlText[i]).

If you need something a little more robust, use HtmlMonkey or HtmlAgilityPack to parse it for you.

score -1 · Answer 3 · answered Mar 01 '17 at 08:50

-1

The best way is using regular expression to extract inner next between html tags some. Something like this might does work: ((.+?)</h.?>)+((.+?)</p.?>)

answered Mar 01 '17 at 08:50

Mohammad Nikravesh

947
1
8
27

Parsing HTML string usiing C#

3 Answers3