3

Possible Duplicate:
What is the best way to parse html in C#?

Is there a way to parse HTML or convert HTML to XML so I extract the information out of the website easily?

I'm working with C#.

Thank you,

Community
  • 1
  • 1
Jerry
  • 1,018
  • 4
  • 13
  • 22

2 Answers2

5

HTMLAgilityPack is what you are looking for. Check out this tutorial Parsing HTML Document with HTMLAgilityPack

carla
  • 1,970
  • 1
  • 31
  • 44
Habib
  • 219,104
  • 29
  • 407
  • 436
5

You can use the COM objects in Microsoft HTML Object Library to load HTML, and then use it's object model to navigate around. An example is shown below:

string html;
WebClient webClient = new WebClient();
using (Stream stream = webClient.OpenRead(new Uri("http://www.google.com")))
using (StreamReader reader = new StreamReader(stream))
{
  html = reader.ReadToEnd();
}
IHTMLDocument2 doc = (IHTMLDocument2)new HTMLDocument();
doc.write(html);
foreach (IHTMLElement el in doc.all)
  Console.WriteLine(el.tagName);
Michael
  • 8,891
  • 3
  • 29
  • 42