1

To collect information on a webpage, I can use the WebBrowser.Navigated event.

First, navigate to the url:

WebBrowser wbCourseOverview = new WebBrowser();
wbCourseOverview.ScriptErrorsSuppressed = true;
wbCourseOverview.Navigate(url);
wbCourseOverview.Navigated += wbCourseOverview_Navigated;

Then process the webpage when Navigated is called:

void wbCourseOverview_Navigated(object sender, WebBrowserNavigatedEventArgs e)
    {
        //Find the control and invoke "Click" event...
    }

The difficult part comes when I try to go through a string array of urls.

foreach (var u in courseUrls)
        {
            WebBrowser wbCourseOverview = new WebBrowser();
            wbCourseOverview.ScriptErrorsSuppressed = true;
            wbCourseOverview.Navigate(u);

            wbCourseOverview.Navigated += wbCourseOverview_Navigated;
        }

Here, because the page load takes time, wbCourseOverview_Navigated is never reached.

I tried to use the async await in C#5. Tasks and the Event-based Asynchronous Pattern (EAP) is found in here. Another example can be found in The Task-based Asynchronous Pattern.

The problem is WebClient has async method like DownloadDataAsync and DownloadStringAsync. But there is no NavigateAsync in WebBrowser.

Can any expert give me some advice? Thank you.


There is a post in StackOverflow (here). But, does anyone know how to implement that strut in its answer?


Update again.

Suggested in another post here in StackOverflow,

public static Task WhenDocumentCompleted(this WebBrowser browser)
{
    var tcs = new TaskCompletionSource<bool>();
    browser.DocumentCompleted += (s, args) => tcs.SetResult(true);
    return tcs.Task;
}

So I have:

foreach (var c in courseBriefs)
    {
        wbCourseOverview.Navigate(c.Url);
        await wbCourseOverview.WhenDocumentCompleted();
    }

It looks good until my web browser visits the second url.

An attempt was made to transition a task to a final state when it had already completed.

I know I must have made a mistake inside the foreach loop. Because the DocumentCompleted event has not been raised when it loops to the second round. What is the correct way to write this await in a foreach loop?

Community
  • 1
  • 1
Blaise
  • 21,314
  • 28
  • 108
  • 169
  • 1
    Are you trying to scrape HTML from the `WebBrowser` control? If so, that is a very inefficient method as there is a lot of overhead by loading all of the images, JavaScript and plugins. You could process the HTTP requests yourself and do something with the response afterwards. – Cameron Tinker Apr 10 '13 at 18:26
  • If you read the article you linked to, then you should be able to build `NavigateAsync()` by yourself, using `Navigate()`, `Navigated` and `TaskCompletionSource`. – svick Apr 10 '13 at 19:32
  • @CameronTinker, I need not just HTML. What I want is to invoke Click events on some DOM controls. So I will not just use `DownloadStringTaskAsyc`. – Blaise Apr 12 '13 at 16:30
  • @svick, I think you are pointing at a good direction. This is what I wanted to do. Can you please give me more instructions? And why are you suggesting me using `Navigated` instead of `DocumentCompleted` to build `NavigateAsyc()`? – Blaise Apr 12 '13 at 16:31
  • @Blaise I was suggesting `Navigared` simply because that's what you used in your original code. – svick Apr 12 '13 at 18:28
  • Also, if you have a new question, *ask a new question*. Editing a question is meant for clarifying a question, not asking something completely different. – svick Apr 12 '13 at 19:08

2 Answers2

4

There is a post in StackOverflow (here). But, does anyone know how to implement that strut in its answer?

Ok, so you want some code with awaiter. I've made two pieces of code. The first one uses TPL's built-in awaiter:

 public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        private void button1_Click(object sender, EventArgs e)
        {
            ProcessUrlsAsync(new[] { "http://google.com", "http://microsoft.com", "http://yahoo.com" })
                .Start();
        }

        private Task ProcessUrlsAsync(string[] urls)
        {
            return new Task(() =>
            {
                foreach (string url in urls)
                {
                    TaskAwaiter<string> awaiter = ProcessUrlAsync(url);
                    // or the next line, in case we use method *
                    // TaskAwaiter<string> awaiter = ProcessUrlAsync(url).GetAwaiter();                     
                    string result = awaiter.GetResult();

                    MessageBox.Show(result);
                }
            });
        }        

        // Awaiter inside
        private TaskAwaiter<string> ProcessUrlAsync(string url)
        {
            TaskCompletionSource<string> taskCompletionSource = new TaskCompletionSource<string>();
            var handler = new WebBrowserDocumentCompletedEventHandler((s, e) =>
            {
                // TODO: put custom processing of document right here
                taskCompletionSource.SetResult(e.Url + ": " + webBrowser1.Document.Title);
            });
            webBrowser1.DocumentCompleted += handler;
            taskCompletionSource.Task.ContinueWith(s => { webBrowser1.DocumentCompleted -= handler; });

            webBrowser1.Navigate(url);
            return taskCompletionSource.Task.GetAwaiter();
        }

        // (*) Task<string> instead of Awaiter
        //private Task<string> ProcessUrlAsync(string url)
        //{
        //    TaskCompletionSource<string> taskCompletionSource = new TaskCompletionSource<string>();
        //    var handler = new WebBrowserDocumentCompletedEventHandler((s, e) =>
        //    {
        //        taskCompletionSource.SetResult(e.Url + ": " + webBrowser1.Document.Title);
        //    });
        //    webBrowser1.DocumentCompleted += handler;
        //    taskCompletionSource.Task.ContinueWith(s => { webBrowser1.DocumentCompleted -= handler; });

        //    webBrowser1.Navigate(url);
        //    return taskCompletionSource.Task;
        //}

And the next sample contains the sample implementation of awaiter struct Eric Lippert was talking about here.

public partial class Form1 : Form
    {
        public struct WebBrowserAwaiter
        {
            private readonly WebBrowser _webBrowser;
            private readonly string _url;

            private readonly TaskAwaiter<string> _innerAwaiter;

            public bool IsCompleted
            {
                get
                {
                    return _innerAwaiter.IsCompleted;
                }
            }

            public WebBrowserAwaiter(WebBrowser webBrowser, string url)
            {
                _url = url;
                _webBrowser = webBrowser;
                _innerAwaiter = ProcessUrlAwaitable(_webBrowser, url);
            }

            public string GetResult()
            {
                return _innerAwaiter.GetResult();

            }

            public void OnCompleted(Action continuation)
            {
                _innerAwaiter.OnCompleted(continuation);
            }

            private TaskAwaiter<string> ProcessUrlAwaitable(WebBrowser webBrowser, string url)
            {
                TaskCompletionSource<string> taskCompletionSource = new TaskCompletionSource<string>();
                var handler = new WebBrowserDocumentCompletedEventHandler((s, e) =>
                {
                    // TODO: put custom processing of document here
                    taskCompletionSource.SetResult(e.Url + ": " + webBrowser.Document.Title);
                });
                webBrowser.DocumentCompleted += handler;
                taskCompletionSource.Task.ContinueWith(s => { webBrowser.DocumentCompleted -= handler; });

                webBrowser.Navigate(url);
                return taskCompletionSource.Task.GetAwaiter();
            }
        }

        public Form1()
        {
            InitializeComponent();
        }

        private void button1_Click(object sender, EventArgs e)
        {
            ProcessUrlsAsync(new[] { "http://google.com", "http://microsoft.com", "http://yahoo.com" })
                .Start();
        }

        private Task ProcessUrlsAsync(string[] urls)
        {
            return new Task(() =>
            {
                foreach (string url in urls)
                {
                    var awaiter = new WebBrowserAwaiter(webBrowser1, url);
                    string result = awaiter.GetResult();

                    MessageBox.Show(result);
                }
            });
        }
    }   
        }

Hope this helps.

Community
  • 1
  • 1
alex.b
  • 4,547
  • 1
  • 31
  • 52
  • Thanks for the help. It seems you are using the `await` or `async` keyword in this solution. Is there any reason you avoid them? Are they not easy to implement here in this situation? – Blaise Apr 12 '13 at 16:45
  • This answer is so detailed. Now I at least have a working example. Thank you. – Blaise Apr 12 '13 at 16:54
0

Instead of using wbCourseOverview_Navigated use webBrowser1_DocumentCompleted when fist URL load completed done your job and go to next url

List<string> urls = new List<string>();
    int count = 0;
    public Form1()
    {
        InitializeComponent();
        webBrowser1.DocumentCompleted+=new WebBrowserDocumentCompletedEventHandler(webBrowser1_DocumentCompleted);
    }
    private void Form1_Load(object sender, EventArgs e)
    {
        webBrowser1.Navigate(urls[count++]);
    }

    private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
    {
        //Do something
        webBrowser1.Navigate(urls[count++]);
    }
KF2
  • 9,887
  • 8
  • 44
  • 77
  • Ha! This answer makes me feel stupid again. Why do I have to stick with my `foreach` loop?! I will wait some more answers before marking an answer, just to see if someone and offer us some `async-await` solution since I really want to know. But thank you IRSOG for the quick reply. – Blaise Apr 10 '13 at 17:51
  • @Blaise:there is a metod for Async/Await implementation of WebBrowser class for .NET:http://stackoverflow.com/questions/8610197/async-await-implementation-of-webbrowser-class-for-net check it also – KF2 Apr 10 '13 at 18:06
  • Yes I already starred that post. But do you have any idea how to implement that strut? – Blaise Apr 10 '13 at 18:08
  • i didn't try that before but see this also:http://stackoverflow.com/questions/14836416/async-method-with-completed-event – KF2 Apr 10 '13 at 18:19
  • I tried to use the solution offered in the last post. But there is some difficulty also. Please see my update in the original post. Thanks. – Blaise Apr 12 '13 at 16:42