Possible Duplicate:
What is the best way to parse html in C#?
Is there a way to parse HTML or convert HTML to XML so I extract the information out of the website easily?
I'm working with C#.
Thank you,
Possible Duplicate:
What is the best way to parse html in C#?
Is there a way to parse HTML or convert HTML to XML so I extract the information out of the website easily?
I'm working with C#.
Thank you,
HTMLAgilityPack is what you are looking for. Check out this tutorial Parsing HTML Document with HTMLAgilityPack
You can use the COM objects in Microsoft HTML Object Library
to load HTML, and then use it's object model to navigate around. An example is shown below:
string html;
WebClient webClient = new WebClient();
using (Stream stream = webClient.OpenRead(new Uri("http://www.google.com")))
using (StreamReader reader = new StreamReader(stream))
{
html = reader.ReadToEnd();
}
IHTMLDocument2 doc = (IHTMLDocument2)new HTMLDocument();
doc.write(html);
foreach (IHTMLElement el in doc.all)
Console.WriteLine(el.tagName);