Is there an alternative to WebBrowser control for DOM traversal?

Question

I am currently using a WebBrowser control in my Windows Forms application to navigate to a URL. Once I am at that URL, I use the FirstChild in conjunction with NextSibling methods of the HtmlElement class to walk the document tree from the WebBrowser.Document object. The reason I do this is to get information from a page and store this information into a database.

Here is the crux of my question: Do I really need to use the WebBrowser class? I currently do not need to display the web page to the user, only some of the information found in the page. Is there a better way to do this without relying on this class? Something solid which can do DOM traversal would be required, but as mentioned above, I do not need to display the web page.

Regards Crouz

score 1 · Answer 1 · edited May 23 '17 at 11:59

You can use a WebClient to download the HTML without displaying the page. You can then use something like HTML Agility Pack to create an HTMLDocument from the string.

Example:

using (WebClient wc = new WebClient())
{
    string html = wc.DownloadString("http://www.foo.bar/"); // Change as required.
    HtmlAgilityPack.HtmlDocument h = new HtmlAgilityPack.HtmlDocument();
    h.LoadHtml(html);
}

Reason to use HTML Agility Pack:

The HtmlDocument class is a wrapper around the native IHtmlDocument2 COM interface. You cannot easily create it from a string.....

and thus not without using the WebBrowser.

From https://stackoverflow.com/a/4935482/4546874.

However, you can hide the WebBrowser.

I came back to my question to see if there were any answers, and It seems my reply to Farhan's response didn't get posted, so it goes again. Thanks Farhan for your response, I shall investigate the WebClient class and see if it will serve my purpose. Regards, Crouz — Crouzilles, Nov 11 '15 at 11:13

Is there an alternative to WebBrowser control for DOM traversal?

1 Answers1