-2

I have a string with html text as shown below.

string htmlText = "<h1>This is heading 1</h1><p>This is some text.</p>
<hr><h2>This is heading 2</h2><p>This is some other text.</p><hr>";

Can we convert this html string as we see it in browser after it has been parsed so that later we can use this parsed string where ever required.

Later I want to copy this data to a sharepoint list multiline rich text column. There I dont need these tags to come, but

vikash kumar
  • 1
  • 1
  • 4

3 Answers3

1

This answer provides an example using HtmlAgilityPack, which is much more robust than rolling your own parsing or regular expressions.

XPATH is your friend :)

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(@"<html><body><p>foo <a href='http://www.example.com'>bar</a> baz</p></body></html>");

foreach(HtmlNode node in doc.DocumentNode.SelectNodes("//text()"))
{
    Console.WriteLine("text=" + node.InnerText);
}
Community
  • 1
  • 1
ta.speot.is
  • 26,914
  • 8
  • 68
  • 96
1

Your question isn't entirely clear and cuts off at the end. But you can actually parse the data if you want. Just examine each character to find the tags using string indexes (e.g. htmlText[i]).

If you need something a little more robust, use HtmlMonkey or HtmlAgilityPack to parse it for you.

Jonathan Wood
  • 65,341
  • 71
  • 269
  • 466
-1

The best way is using regular expression to extract inner next between html tags some. Something like this might does work: ((.+?)</h.?>)+((.+?)</p.?>)

Mohammad Nikravesh
  • 947
  • 1
  • 8
  • 27