2

I'm trying to use WebBrowser class, but of course it doesn't work.

My code:

WebBrowser browser = new WebBrowser();
browser.Navigate("http://www.google.com");

while(browser.DocumentText == "")
{
    continue;
}
string html = browser.DocumentText;

browser.DocumentText is always "". Why?

Yuck
  • 49,664
  • 13
  • 105
  • 135
carck3r
  • 317
  • 1
  • 5
  • 17
  • 2
    Well, right off the bat, I can almost guarantee that `Navigate` is NOT an asynchronous function, thus `DocumentText` will not change after Navigate returns--in other words, this will be an infinite loop whenever `DocumentText` is empty. – riwalk Dec 15 '11 at 20:34

5 Answers5

5

You should use DocumentCompleted event, and if you don't have WebForms application, also ApplicationContext might be needed.

static class Program
{
    [STAThread]
    static void Main()
    {
        Context ctx = new Context();
        Application.Run(ctx);

        // ctx.Html; -- your html
    }
}

class Context : ApplicationContext
{
    public string Html { get; set; }

    public Context()
    {
        WebBrowser browser = new WebBrowser();
        browser.AllowNavigation = true;
        browser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(browser_DocumentCompleted);
        browser.Navigate("http://www.google.com");
    }

    void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
    {
        Html = ((WebBrowser)sender).DocumentText;
        this.ExitThread();
    }
}
Krzysztof
  • 15,900
  • 2
  • 46
  • 76
3

The WebBrowser isn't going to do it's job until the current thread finishes it's work, if you changed it to be something like this:

        WebBrowser browser = new WebBrowser();
        browser.Navigate("http://www.google.com");
        browser.Navigated += (s, e) =>
            {
                var html = browser.DocumentText;
            };

The variable will be set.

But, as others have mentioned, the document completed is a better event to attach to, as at that time, the entire document will be completed (appropriate name!)

        WebBrowser browser = new WebBrowser();
        browser.Navigate("http://www.google.com");

        browser.DocumentCompleted += (s, e) =>
            {
                var html = browser.DocumentText;
                html.ToString();
            };
McKay
  • 12,334
  • 7
  • 53
  • 76
  • It works, but it doesn't load HTML with javascript. I can use HttpWebRequest, but I need javascript. Help me. – carck3r Dec 15 '11 at 21:02
  • @carck3r What do you mean by "need javascript"? The javascript will either be inline and loaded by either of those two methods, or are accessible by separate web calls. – McKay Dec 15 '11 at 21:38
  • @carck3r If you look at the source of that page in a typical browser, you will see stuff like `` That's built into the page. If you want something that will parse your javascript for you, you'll have to write it yourself. The webbrowser control will display it for you in a web form, and will parse the javascript for you though. – McKay Dec 15 '11 at 21:53
  • @carck3r The WebBrowser uses the IE rendering engine. If you ask for the document text, you'll get the same thing that view source gives you in IE. But the presentation will also be the same as the IE rendering engine, and it will parse and evaluate the javascript just like IE would. It is not really possible to get what you're asking without domain specific information, as javascript doesn't always stop modifying a page. It could modify that page continuously. – McKay Dec 15 '11 at 22:07
2

Attach to the DocumentCompleted event, the code is as below

browser.DocumentCompleted += (s, e) =>
{
    string html = browser.DocumentText;
};
Vimal
  • 1,266
  • 1
  • 9
  • 16
L.B
  • 114,136
  • 19
  • 178
  • 224
1

If you need the DocumentText you should handle the DocumentCompleted event

  browser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(browser_DocumentCompleted);

See event below

void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
    {

        WebBrowser wb = (WebBrowser)sender;
        string text = wb.DocumentText;

}
scartag
  • 17,548
  • 3
  • 48
  • 52
-1

Try something like this

string url = string.Empty:
string html = "http://www.google.com/";
string url = html;
if (!url.StartsWith("http://") && !url.StartsWith("https://"))
{
   url = "http://" + url;
}
browser.Navigate(new Uri(url)); 

replace it within your While loop where necessary

MethodMan
  • 18,625
  • 6
  • 34
  • 52