I am developing a windows application for web scraping. To do this, I use the Webbrowser control - I can't use the the webrequest/webclient/webresponse classes because the web pages are loaded dynamically using javascript.
The application works fine, but since I do a lot of processing, it loads the UI unnecessarily. I get the "not responding" message intermittently. So what I did is:
1. Create the webbrowser on the UI thread
2. Put the long-running processes on a background thread
3. Whenever I need to get the page' document I use a Control.Invoke.
4. Return the page's document via the invoke call to the background thread
In the callback function, I can see that the page's document is extracted fine. However, the document (HtmlDocument) returned to background worker is not correctly evaluated. When I step through the debugger, I get "Function evaluation timed out message...". I've played around with the syntax and keep getting invalid cast exception or cross threading messaging exception.
Below is how I've coded the callback/ delegate:
private delegate HtmlDocument RefreshDelegate();
private HtmlDocument RefreshBrowser()
{
WebBrowser br1 = ((WebBrowser)this.Controls["br1"]); //get webbrowser, "br1"
br1.Refresh(); //refresh browser
return br1.Document; //is retrieved correctly
}
Now for the code in the background worker that processes the "returned" HTMLDocument:
WebBrowser br1 = ((WebBrowser)this.Controls["br1"]); //get the browser
HtmlDocument document = (HtmlDocument)br1.Invoke(new RefreshDelegate(this.RefreshBrowser)); //not evaluated
//do stuff with document
Debugger message encountered: "Function evaluation disabled because a previous function evaluation timed out. You must continue execution to reenable function evaluation.". Is this the correct way to solve this problem? As I said I can't get the javascript content with webrequest etc, I also can't run the htmldocument parsing on the UI, because it results in a poor user experience. Additionally, it happens that i need to create several webbrowser instances. If this is not the best way, I'm open to other libraries as well. Thanks.