0

I'm not new to programming but am new to using WebBrowser controls in C# WinForm apps.

I have two WebBrowser controls that I dynamically load onto a form. The first navigates to a URL and upon completion I loads it's DocumentText into an HtmlAgilityPack document. I use XPath to parse some links that are then passed to the second browser control. Everything works fine until after the second browser control loads. Its .DocumentText is less than 700 bytes long. If I bypass the rest of the routine and return to the screen, the correct and full page is displayed in the second control, however I can't get that to happen inside the routine. The bare bones of the code is as follows.

private WebBrowser webBrowser = new WebBrowser();
private WebBrowser webBrowser2 = new WebBrowser();
private TaskCompletionSource<bool> tcs = null;
private TaskCompletionSource<bool> tcs2 = null;
private string lastnav = "";
private string lastMessage = "";

private void webBrowser_Navigating(object sender, WebBrowserNavigatingEventArgs e)
{
    lastnav = e.Url.ToString();
    this.txtNavigated.Text += e.Url.ToString() + "\r\n\r\n";
    if (webBrowser.Document != null && webBrowser.Document.Cookie != null)
        this.txtNavigated.Text += webBrowser.Document.Cookie + "\r\n\r\n";
    this.txtNavigated.Update();
}

private async void TryNavigate(string url)
    webBrowser.Location = new System.Drawing.Point(12, 226);
    webBrowser.Size = new System.Drawing.Size(1070,100);
    webBrowser2.Location = new System.Drawing.Point(12, 327);
    webBrowser2.Size = new System.Drawing.Size(1070, 100);

    this.Controls.Add(webBrowser);
    this.Controls.Add(webBrowser2);

    tcs = new TaskCompletionSource<bool>();
    WebBrowserDocumentCompletedEventHandler documentCompletedHandler = (sender2, e2) => tcs.TrySetResult(true);
    WebBrowserNavigatingEventHandler docNavigatingHandler = webBrowser_Navigating;

    try
    {
        Uri baseUri = new Uri("https://www.labcorp.com/wps/portal/provider/testmenu");
        Uri newUri = null;
        webBrowser.DocumentCompleted += documentCompletedHandler;
        webBrowser.Navigating += docNavigatingHandler;
        try
        {
            webBrowser.Navigate(baseUri.AbsoluteUri);
            await tcs.Task;
        }
        catch (WebException webex) {
            lastMessage = webex.Message;
        }
        catch (Exception ex)
        {
            lastMessage = ex.Message;
        }
        finally
        {
            webBrowser.DocumentCompleted -= documentCompletedHandler;
        }
        webBrowser2.Navigate("localhost");
        webBrowser2.Document.Cookie = webBrowser.Document.Cookie;
        webBrowser2.Navigating += docNavigatingHandler;
        webBrowser2.DocumentCompleted += documentCompletedHandler2;

        HtmlAgilityPack.HtmlDocument azlinks = new HtmlAgilityPack.HtmlDocument();
        azlinks.LoadHtml(webBrowser.DocumentText);
        // get A - Z
        var azlinkNodes = azlinks.DocumentNode.SelectNodes("//div[@class='searchDiv']/table/tr/td/a");
        if (azlinkNodes != null)
        {
            tcs2 = new TaskCompletionSource<bool>();
            WebBrowserDocumentCompletedEventHandler documentCompletedHandler2 = (sender2, e2) => tcs2.TrySetResult(true);
            if (Uri.TryCreate(baseUri, azlinkNodes[0].Attributes["href"].Value, out newUri))
            {
                try
                {
                    webBrowser2.Navigate(newUri);
                    await tcs2.Task;
                }
                finally
                {
                    webBrowser2.DocumentCompleted -= documentCompletedHandler2;
                }

                // **************************************************
                // will not come out of this test loop
                //while (webBrowser2.DocumentText.Length < 10000) {
                //    webBrowser2.Update();
                //    System.Threading.Thread.Sleep(500);
                //}

                MessageBox.Show("webBrowser2.DocumentText.Length = " + webBrowser2.DocumentText.Length.ToString(), "Length");
            }
        }
    }
    catch (Exception ex)
    {
        lastMessage = ex.Message;
    }
}

I created a test button on the form so that after the routine had returned to the screen I could click it and check what the second browser was up to.

    private void button1_Click(object sender, EventArgs e)
{
    MessageBox.Show("webBrowser2.DocumentText.Length = " + webBrowser2.DocumentText.Length.ToString(), "Length");
}

Upon clicking the button the second browser's .DocumentText length was correct (about 130K+), but I don't see a way to get it to return that in the middle of the routine. As you can see in the commented out code I did a test to see if Update() would help, but it stays forever in the loop.

Does anyone know of a way for it to finish loading without returning to the screen?

Any help would surely be appreciated.

Thanks

  • You cannot use `Thread.Sleep`, you need to keep pumping messages. It's tempting to use `DoEvents` but that'd be also wrong. Checking [this](http://stackoverflow.com/questions/22239357/how-to-cancel-task-await-after-a-timeout-period/22262976#22262976) for some fresh ideas. – noseratio Feb 08 '15 at 11:03
  • `Thread.Sleep` is for half a second... not really caring if it takes 2 minutes to load. The reality is that it doesn't. I don't care if I can grab the app's title bar and drag it across the screen during this time because it is not what my aim is. I only want the second browser to load the correct code into its DocumentText, which is does not, until it returns from the routine onto the screen. – Pete Dillman Feb 08 '15 at 12:46

2 Answers2

0

My guess is, on the second iteration of the for loop the tcs2.Task is already in completed state, so await returns immediately and you continue processing the document which hasn't been loaded yet. The simplest way to fix this is to create tcs2 as well as documentCompletedHandler2 inside the for loop.

vadim
  • 176
  • 6
  • Very good point, Vadim, and I will make that edit to my code when I get it to work, however, the `for` loop may as well not be there due to the `break` I have set. I was going to need the `for` loop later after I tested the code and should have removed it for this sample. The `tcs2` never gets hit twice, and even if it did, when I had the `while` loop uncommented it should have broken out of the loop, but it didn't. – Pete Dillman Feb 08 '15 at 09:25
  • I thought `break` was there "to fall back to the screen" and it's without it the code is not doing what you expect it to do. BTW if you meant to reload the document inside that `while` loop, then you should have used `Refresh` method instead of `Update`. `Update` just causes a control to redraw itself. – vadim Feb 08 '15 at 10:30
  • I just modified the sample above... it now has message boxes to recite the length of the DocumentText. Update or Refresh doesn't matter because either way it is waiting for the length to be greater than 10000, which isn't happening. If the `while` loop is commented out then the MessageBox will display 694. After closing that and pressing the button1_Click I get 133002 as the size. The point is that it never happens inside the routine. – Pete Dillman Feb 08 '15 at 10:45
  • Maybe its got something to do with their javaScript routines, but it is very bizarre... I would have thought that `tcs2` would not have fired true until all the JavaScript was finished loading the page.... – Pete Dillman Feb 08 '15 at 10:46
  • 1
    You should check that `webBrowser2.ReadyState == WebBrowserReadyState.Complete` inside the `documentCompletedHandler2` and only in that case call `TrySetResult`. It turns out `DocumentCompleted` event may be raised multiple times if the document contain iframes. See here http://stackoverflow.com/a/9835755/1401662 – vadim Feb 08 '15 at 11:24
  • Naw, that doesn't work... there is no iFrame in this. It is – Pete Dillman Feb 08 '15 at 12:04
  • then try to replace the `Thread.Sleep` call with `await Task.Delay(TimeSpan.FromMilliseconds(500))` so that the main thread is free to do whatever webbrowser2 needs it to – vadim Feb 08 '15 at 12:52
  • You are missing the point (aside from what you recommend does not work) The `tcs2.Task` should not fire until the document has implemented all of it's JavaScript and updated it's Html, correct? Therefore the webBrowser2.DocumentText should be the complete Html for the document and not some stage before the JavaScript has played with the DOM. – Pete Dillman Feb 08 '15 at 13:02
  • The only way I currently find is to compile all of the links that I need and let the app return to the form, set a timer to play every 3 or 4 seconds and then return to the list to show the user what information they desire. Sure wish this could be easier, but we are dealt what we are dealt... Thanks All for trying to help. – Pete Dillman Feb 08 '15 at 13:10
0

You cannot use Thread.Sleep, you need to keep pumping messages. It's tempting to use DoEvents but that'd be also wrong. Checking this for some fresh ideas. – Noseratio 11 hours ago

then try to replace the Thread.Sleep call with await Task.Delay(TimeSpan.FromMilliseconds(500)) so that the main thread is free to do whatever webbrowser2 needs it to – vadim 9 hours ago

This actually does work. I don't know if maybe their AJAX server was slow last night or what the problem was, but I revisited all of these suggestions again today and this one works after about 1.5 seconds. Thanks for all the help!

I would mark as answer but it wasn't posted as one.