3

The problem I'm facing is that, when dealing with the WebBrowser control (whether it's visible or not), it causes the UI to freeze for a small amount of time while navigating, which becomes very noticeable and unreliable when having to open several URLs sequentially.

I'm currently using Noseratio's NavigateAsync extension method to navigate to multiple URLs silently & asynchronously: (Feel free to skip reading the code and continue with the question)

public static async Task<string> NavigateAsync(this WebBrowser webBrowser, string url, CancellationToken token)
{
    var tcs = new TaskCompletionSource<bool>();
    WebBrowserDocumentCompletedEventHandler handler = (s, arg) => tcs.TrySetResult(true);

    using (token.Register(() => { webBrowser.Stop(); tcs.TrySetCanceled(); }, true))
    {
        webBrowser.DocumentCompleted += handler;
        try
        {
            webBrowser.Navigate(url);
            await tcs.Task; // wait for DocumentCompleted
        }
        finally
        {
            webBrowser.DocumentCompleted -= handler;
        }
    }

    var documentElement = webBrowser.Document.GetElementsByTagName("html")[0];
    var html = documentElement.OuterHtml;
    while (true)
    {
        await Task.Delay(POLL_DELAY, token);
        if (webBrowser.IsBusy)
            continue;

        var htmlNow = documentElement.OuterHtml;
        if (html == htmlNow) break; 

        html = htmlNow;
    }

    token.ThrowIfCancellationRequested();
    return html;
}

But even the simplest code like the following:

WebBrowser wb = new WebBrowser() { ScriptErrorsSuppressed = true };
wb.Navigate("https://www.google.com/");

..still has the same effect.

Here's a quick demo video showing the problem with the simplest code possible.

I also tried having the WebBrowser running on a different STA thread, but still no luck.

So, is there a way to avoid that freeze while dealing with WebBrowser?


Before you bother to suggest replacing it with HttpClient or WebClient with HTMLAgilityPack, please note that I'm using WebBrowser in order to get the displayed text, formatted as close as possible to how it's displayed in the browser (i.e., as close as possible to manually selecting & copying the text). Every solution I tried (or found online) without using a browser failed to achieve this, even the one that produced the closest result wasn't good enough.

  • 1
    Hmm, this is not a common complaint at all, WebBrowser is just a wrapper for native code that is heavily threaded under the hood. That GetElementsByTagName() call certainly ought to be expensive on a large document since it retrieves *everything*. Use a profiler to see where the expense goes, one that can also profile native code. And temporarily disable anti-malware since it invariably acts up when doing anything internetty. And consider doing this all on an STA worker thread since you don't care about the view. – Hans Passant Mar 04 '18 at 15:36
  • @HansPassant I'm confident that `GetElementsByTagName` isn't what's causing the issue here. Please check the part that says *"But even the simplest code like the following.."*. – 41686d6564 stands w. Palestine Mar 04 '18 at 15:39
  • If nothing else helps - you can run that code in separate UI thread, so it will (hopefully) not freeze your main UI thread. – Evk Mar 04 '18 at 15:52
  • @HansPassant It seems to be an uncommon complaint because I didn't find many questions about this particular issue, but I tried it on two different PCs and had the same behavior. Perhaps people mostly use WebBrowser docked in a window, so it doesn't really matter what happens while the page is loading? Anyway, here's a [30sec demo video](https://drive.google.com/open?id=1t1Qr9uLKY47WzXfKUZEe84Uf59WlY6za). – 41686d6564 stands w. Palestine Mar 04 '18 at 16:08
  • @Evk Thanks for the suggestion. I already tried that, and it's mentioned in the question. – 41686d6564 stands w. Palestine Mar 04 '18 at 16:09
  • If you use multiple `WebBroswer` controls it makes your application freeze when those browsers are loading. To solve the problem, I used [this solution](https://stackoverflow.com/a/40057126/3110834). Then I was able to load web browser controls behind the scene without any lag in UI. – Reza Aghaei Mar 04 '18 at 17:13

1 Answers1

1

I can confirm when you load WebBrowser control, the UI freeze for a few moments and if you use multiple instances of WebBrowser control to load multiple urls, the lagging UI is annoying and you can not interact with the main window.

To reproduce the problem, you can use the following code:

string google = "http://www.google.com";
var urls = Enumerable.Range(1, 100).Select(x => google).ToList();
foreach (var url in urls)
{
    var w = new WebBrowser() { ScriptErrorsSuppressed = true };
    w.DocumentCompleted += (obj, args) =>
        {
            var txt = ((WebBrowser)obj).DocumentText;
            this.textBox1.Text = DateTime.Now.ToString() + Environment.NewLine
                + txt.Substring(1, 200) + "...";
        };
    w.Navigate(url);
}

To solve the problem, you can create a method which loads the WebBrowser control in another thread and return a Task<string> which completes when the browser document completed. I've created a BrowserBasedWebScraper in this post and you can use it to get content of WebBrowser control behind the scene without lagging UI:

string google = "http://www.google.com";
var urls = Enumerable.Range(1, 100).Select(x => google).ToList();
foreach (var url in urls)
{
    var txt = await BrowserBasedWebScraper.LoadUrl(url);
    this.textBox1.Text = DateTime.Now.ToString() + Environment.NewLine
        + txt.Substring(1, 200) + "...";
}

You can also download a working example from this repository.

Reza Aghaei
  • 120,393
  • 18
  • 203
  • 398