3

I'm trying to navigate across a website and do some work on the pages programmatically using a WebBrowser control in a Windows Form. I found this while looking for a way to block my thread until the WebBrowser's DocumentCompleted event is triggered. Given that, here's my current code:

public partial class Form1 : Form
{
    private AutoResetEvent autoResetEvent;

    public Form1()
    {
        InitializeComponent();
    }

    private void button1_Click(object sender, EventArgs e)
    {
        Thread workerThread = new Thread(new ThreadStart(this.DoWork));
        workerThread.SetApartmentState(ApartmentState.STA);
        workerThread.Start();
    }

    private void DoWork()
    {
        WebBrowser browser = new WebBrowser();
        browser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(browser_DocumentCompleted);
        browser.Navigate(login_page);
        autoResetEvent.WaitOne();
        // log in

        browser.Navigate(page_to_process);
        autoResetEvent.WaitOne();
        // process the page
    }

    private void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
    {
        autoResetEvent.Set();
    }
}

The thread doesn't look necessary, but it will be when I expand this code to accept requests over the network (the thread will listen for connections, then process the requests). Also, I can't just put the processing code inside the DocumentCompleted handler, since I have to navigate to several different pages and do different things on each one.

Now, from what I understand, the reason this doesn't work is because the DocumentCompleted event uses the same thread that WaitOne() is being called in, so the event will not be fired until WaitOne() returns (never, in this case).

What's interesting is that if I add a WebBrowser control to the form from the toolbox (drag-and-drop), then navigate using that, this code works perfectly (with no changes other than putting the call to Navigate inside a call to Invoke - see below). But if I manually add a WebBrowser control to the Designer file, it doesn't work. And I don't really want a visible WebBrowser on my form, I just want to report the results.

public delegate void NavigateDelegate(string address);
browser.Invoke(new NavigateDelegate(this.browser.Navigate), new string[] { login_page });

My question, then, is: What's the best way to suspend the thread until the browser's DocumentCompleted event fires?

Community
  • 1
  • 1
Chris
  • 31
  • 1
  • 3

3 Answers3

1

Chris,

I pass you here a possible implementation that solves the problem, but please give a look at the comments here under that I had to face and fix before everything worked as I was expecting. Here an example of a method doing some activities on a page in a webBrowser (note that the webBrowser is part of a Form in my case):

    internal ActionResponse CheckMessages() //Action Response is a custom class of mine to store some data coming from pages
        {
        //go to messages
        HtmlDocument doc = WbLink.Document; //wbLink is a referring link to a webBrowser istance
        HtmlElement ele = doc.GetElementById("message_alert_box");
        if (ele == null)
            return new ActionResponse(false);

        object obj = ele.DomElement;
        System.Reflection.MethodInfo mi = obj.GetType().GetMethod("click");
        mi.Invoke(obj, new object[0]);

        semaphoreForDocCompletedEvent = WaitForDocumentCompleted();  //This is a simil-waitOne statement (1)
        if (!semaphoreForDocCompletedEvent)
            throw new Exception("sequencing of Document Completed events is failed.");

        //get the list
        doc = WbLink.Document;
        ele = doc.GetElementById("mailz");
        if (!ele.WaitForAvailability("mailz", Program.BrowsingSystem.Document, 10000)) //This is a simil-waitOne statement (2)

            ele = doc.GetElementById("mailz");
        ele = doc.GetElementById("mailz");

        //this contains a tbody
        HtmlElement tbody = ele.FirstChild;

        //count how many elemetns are espionage reports, these elements are inline then counting double with their wrappers on top of them.
        int spioCases = 0;
        foreach (HtmlElement trs in tbody.Children)
        {
            if (trs.GetAttribute("id").ToLower().Contains("spio"))
                spioCases++;
        }

        int nMessages = tbody.Children.Count - 2 - spioCases;

        //create an array of messages to store data
        GameMessage[] archive = new GameMessage[nMessages];

        for (int counterOfOpenMessages = 0; counterOfOpenMessages < nMessages; counterOfOpenMessages++)
        {

            //open first element
            WbLink.ScriptErrorsSuppressed = true;
            ele = doc.GetElementById("mailz");
            //this contains a tbody
            tbody = ele.FirstChild;

            HtmlElement mess1 = tbody.Children[1];
            int idMess1 = int.Parse(mess1.GetAttribute("id").Substring(0, mess1.GetAttribute("id").Length - 2));
            //check if subsequent element is not a spio report, in case it is then the element has not to be opened.
            HtmlElement mess1Sibling = mess1.NextSibling;
            if (mess1Sibling.GetAttribute("id").ToLower().Contains("spio"))
            {
                //this is a wrapper for spio report
                ReadSpioEntry(archive, counterOfOpenMessages, mess1, mess1Sibling);
                //delete first in line
                DeleteFirstMessageItem(doc, ref ele, ref obj, ref mi, ref tbody);
                semaphoreForDocCompletedEvent = WaitForDocumentCompleted(6); //This is a simil-waitOne statement (3)

            }
            else
            {
                //It' s anormal message
                OpenMessageEntry(ref obj, ref mi, tbody, idMess1); //This opens a modal dialog over the page, and it is not generating a DocumentCompleted Event in the webBrowser

                //actually opening a message generates 2 documetn completed events without any navigating event issued
                //Application.DoEvents();
                semaphoreForDocCompletedEvent = WaitForDocumentCompleted(6);

                //read element
                ReadMessageEntry(archive, counterOfOpenMessages);

                //close current message
                CloseMessageEntry(ref ele, ref obj, ref mi);  //this closes a modal dialog therefore is not generating a documentCompleted after!
                semaphoreForDocCompletedEvent = WaitForDocumentCompleted(6);
                //delete first in line
                DeleteFirstMessageItem(doc, ref ele, ref obj, ref mi, ref tbody); //this closes a modal dialog therefore is not generating a documentCompleted after!
                semaphoreForDocCompletedEvent = WaitForDocumentCompleted(6);
            }
        }
        return new ActionResponse(true, archive);
    }

In practice this method takes a page of a MMORPG and reads messages sent to the account by other players and stores them in the ActionResponse class via the method ReadMessageEntry.

Apart the implementation and the logics of the code that are really case dependant (and not useful for you) there are few interesting elements that may be nice to note for you case. I put some comments in the code and highlighted 3 important points [with symbols (1), (2) and (3)]

The algo is:

1) Arrive to a page

2) get the underlying Document from the webBrowser

3) find a element to click to get to the messages page [done with : HtmlElement ele = doc.GetElementById("message_alert_box");]

4) Trigger the event of clicking on it via the MethodInfo instance and the reflection-wise call [ this calls another page so a DocumentCompleted will be arriving sooner or later]

5) Wait for the document completed to be called and then proceed [done with: semaphoreForDocCompletedEvent = WaitForDocumentCompleted(); at point (1)]

6) Fetch the new Document from the webBrowser after the page is changed

7) FInd a particular anchor on the page that is defining where the message I want to read are

8) Be sure that such TAG is present in the page (as there might be some AJAX delaying what I want to read to be ready) [done with: ele.WaitForAvailability("mailz", Program.BrowsingSystem.Document, 10000) that is point (2)]

9) Do the whole loop for reading each message, which implies to open a modal dialog form that is on the same page therefore not generating a DocumentCompleted, read it when ready, then close it, and reloop. For this particular case I use an overload of (1) called semaphoreForDocCompletedEvent = WaitForDocumentCompleted(6); at point (3)

Now the three methods I use to pause, check and read:

(1) To stop while DocumentCompleted is raised without overcharging DocumentCompleted method that may be used for more than one single purpose (as in your case)

private bool WaitForDocumentCompleted()
        {
            Thread.SpinWait(1000);  //This is dirty but working
            while (Program.BrowsingSystem.IsBusy) //BrowsingSystem is another link to Browser that is made public in my Form and IsBusy is just a bool put to TRUE when Navigating event is raised and but to False when the DocumentCOmpleted is fired.
            {
                Application.DoEvents();
                Thread.SpinWait(1000);
            }

            if (Program.BrowsingSystem.IsInfoAvailable)  //IsInfoAvailable is just a get property to cover webBroweser.Document inside a lock statement to protect from concurrent accesses.
            {
                return true;
            }
            else
                return false;
        }

(2) Wait for a particular tag to be available in the page:

public static bool WaitForAvailability(this HtmlElement tag, string id, HtmlDocument documentToExtractFrom, long maxCycles)
        {
            bool cond = true;
            long counter = 0;
            while (cond)
            {
                Application.DoEvents(); //VERIFY trovare un modo per rimuovere questa porcheria
                tag = documentToExtractFrom.GetElementById(id);
                if (tag != null)
                    cond = false;
                Thread.Yield();
                Thread.SpinWait(100000);
                counter++;
                if (counter > maxCycles)
                    return false;
            }
            return true;
        }

(3) The dirty trick to wait for a DocumentCompleted that will ever arrive because no frames need reload on the page!

private bool WaitForDocumentCompleted(int seconds)
    {
        int counter = 0;
        while (Program.BrowsingSystem.IsBusy)
        {
            Application.DoEvents();
            Thread.Sleep(1000);
            if (counter == seconds)
            {
            return true;
            }
            counter++;
        }
        return true;
    }

I pass you also the DocumentCompleted Methods and Navigating to give you the whole picture on how I used them.

private void webBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
        {
            if (Program.BrowsingSystem.BrowserLink.ReadyState == WebBrowserReadyState.Complete)
            {
                lock (Program.BrowsingSystem.BrowserLocker)
                {
                    Program.BrowsingSystem.ActualPosition = Program.BrowsingSystem.UpdatePosition(Program.BrowsingSystem.Document);
                    Program.BrowsingSystem.CheckContentAvailability();
                    Program.BrowsingSystem.IsBusy = false;
                }
            }
        }

private void webBrowser_Navigating(object sender, WebBrowserNavigatingEventArgs e)
        {
            lock (Program.BrowsingSystem.BrowserLocker)
            {
                Program.BrowsingSystem.ActualPosition.PageName = OgamePages.OnChange;
                Program.BrowsingSystem.IsBusy = true;
            }
        }

Please give a look here to know the mess behind DoEvents() if you're now aware about the details that lie behind the implementation presented here (hope it is not a problem to link other sites from S.Overflow).

A small final note on the fact that you need to put the call to your Navigate method inside an Invoke when you use it from a Form instance: this is crystal clear you need an Invoke because the methods that need to work on the webBrowser (or even having it into scope as a refereed variable) need to be launched on the same Thread of the webBrowser itself!

Moreover if the WB is a child of some kind of Form container, it also needs that the thread from where it is instantiated is the same of the Form creation, and for transitivity all the methods that need to work on the WB need to be called on the Form thread (in you case the invoke relocates your calls on the Form native thread). I hope this is useful for you (I just left a //VERIFY comment in the code in my native language to let you know what I think about Application.DoEvents()).

Kind regards, Alex

Community
  • 1
  • 1
Pinoba
  • 66
  • 5
0

HAH! I had the same question. You can do this with event handling. If you stop a thread mid way through the page, it will need to wait until it the page finishes. You can easily do this by attaching

 Page.LoadComplete += new EventHandler(triggerFunction);

In the triggerFunction you can do this

triggerFunction(object sender, EventArgs e)
{
     autoResetEvent.reset();
}

Let me know if this works. I ended up not using threads in mine and instead just putting the stuff into triggerFunction. Some syntax might not be 100% correct because I am answering off the top of my head

Serguei Fedorov
  • 7,763
  • 9
  • 63
  • 94
  • This is basically what I have, except that I make a direct call to autoResetEvent.Set() (autoResetEvent.Reset() will keep blocking, not unblock) instead of using the intermediate triggerFunction. However, this code does not work (again, from what I understand it is because the EventHandler is executed in the same thread as the initial call, so if the thread is blocking indefinitely, the event will never fire). – Chris Jul 03 '12 at 19:40
  • Something seems to be wrong with your main page thread which is preventing it from finishing executing. If you suspend your thread, the page should just keep rendering (hens it being a thread). Maybe I just showed the wrong function calls? However, if anything you want to release your thread on your event trigger since it guarantees that the page has finished rendering (atleast the server thinks it is) – Serguei Fedorov Jul 04 '12 at 01:54
  • I've edited my post to make the DocumentCompleted event handling more clear. Clearly something is wrong with the way the thread is executing, but I can't figure out what it is. – Chris Jul 05 '12 at 15:46
  • It looks like your AutoResetEvent set() function is called once and then its put into wait state again. As far as I understand, you have to actually release all your inner threads for the server to give the page to the browser. How are you ending the second waitOne? Can you give me a better idea of how far your code gets before getting stuck? Does it get past the first waitOne? – Serguei Fedorov Jul 05 '12 at 16:21
  • Yes, calling Set() on an [AutoResetEvent](http://msdn.microsoft.com/en-us/library/system.threading.autoresetevent.aspx) will release one waiting thread then go back into an unsignaled state. This is exactly what I want since I only have one thread waiting. I don't really understand what you meant by "inner threads". The second WaitOne() should be released in the same manner as the first - by the call to Set() inside the browser_DocumentCompleted method. The first WaitOne() blocks forever as written; if I put a timeout, it always times out. The Set() call is never made. – Chris Jul 06 '12 at 17:13
  • Strange it seems like it only triggers that event once.. Inner threads I mean by the threads you start inside the form... this is really stupid, but you could try to reattach the event handler? It could be that that event handler is dumped after the page has finished rendering since your calling more than one page? – Serguei Fedorov Jul 06 '12 at 18:03
  • The event is never triggered. More specifically, as written the event is never triggered. If I add a timeout parameter to the WaitOne() method, the event fires - but only after the WaitOne() times out. – Chris Jul 06 '12 at 18:15
0

EDIT

register in Initialize component method like this, instead of in the same method.

WebBrowser browser = new WebBrowser(); 
WebBrowserDocumentCompletedEventHandler(webBrowser_DocumentCompleted);

ReadyState will tell you the progress of the document loading when checked in the DocumentCompleted event.

void webBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
   if (browser.ReadyState == WebBrowserReadyState.Complete)
{

}
}
Ebad Masood
  • 2,389
  • 28
  • 46
  • If `browser.ReadyState != WebBrowserReadyState.Complete`, you've got an infinite loop. `ReadyState` isn't going to change. The only way this can work is if `browser.ReadyState` will already be `WebBrowserReadyState.Complete` (which would make sense: it's the `DocumentCompleted` event), but in that case, you don't need the loop. –  Jul 03 '12 at 19:12
  • As I mentioned, the DocumentCompleted event never fires, since it is executed in the same thread as the call to Navigate() and therefore will only be executed after WaitOne() returns (which it never does, since the Set() method call is never made). – Chris Jul 03 '12 at 19:42
  • I've tried manually adding a WebBrowser into the InitializeComponent() method, but the results are the same. It works if I let Visual Studio create the code, though (by dragging and dropping a WebBrowser control from the toolbox into the form), but I don't want a visual WebBrowser control on my form. I've determined that what makes the difference is this line of code: `this.Controls.Add(this.browser);` in the InitializeComponent() method. If I add this line, creating the web browser manually works. However, this also places the WebBrowser control at the default location on the form. – Chris Jul 05 '12 at 15:39
  • 1- You can try doing in Form_Load outside Initialize method. – Ebad Masood Jul 05 '12 at 18:09
  • 2- If the control appears in default location can't you adjust it? – Ebad Masood Jul 05 '12 at 18:16