1

Is there any free/open source c# libraries to extract data from html?

Given the input below

<div style="...">
 text part 1
</div>
<div style="...">
 text part 2
</div>

I want the output to be:

text part 1 text part 2
The Mask
  • 17,007
  • 37
  • 111
  • 185
rovsen
  • 4,932
  • 5
  • 38
  • 60

2 Answers2

6

Yes, you can use HtmlAgilityPack to parse HTML using Xpath queries as if it were XML.

carla
  • 1,970
  • 1
  • 31
  • 44
Romias
  • 13,783
  • 7
  • 56
  • 85
4

you can use HtmlAgilitiPack very good library.

and then:

public string StripHTMLTags(string str)
        {
            StringBuilder pureText = new StringBuilder();
            HtmlDocument doc = new HtmlDocument();
            doc.LoadHtml(str);

            foreach (HtmlNode node in doc.DocumentNode.ChildNodes)
            {
                pureText.Append(node.InnerText);
            }

            return pureText.ToString();
        }
The Mask
  • 17,007
  • 37
  • 111
  • 185