2

I am currently reading in the HTML source from a list of URLs that uses JavaScript to load a specific span with a dynamic hyperlink that I need to extract. Everything works fine except for two small bugs that occur but can be dealt with during debugging:

  1. When arriving at the DocumentCompleted event sometimes the Document.Body is null

  2. When t.Join() is called sometimes the program will hang for a long period of time.

    public class WebProcessor
    {
    private string GeneratedSource { get; set; }
    private string URL { get; set; }
    
    public string GetGeneratedHTML(string url)
    {
        URL = url;
    
        Thread t = new Thread(new ThreadStart(WebBrowserThread));
        t.SetApartmentState(ApartmentState.STA);
        t.Start();
        t.Join(); 
        return GeneratedSource;
     //When GetGeneratedHTML() is called more than once there is a chance the program 
    //will hang indefinitely maybe even deadlock??
    }
    
    private void WebBrowserThread()
    {
        WebBrowser wb = new WebBrowser();
       wb.Navigate(URL);
        wb.DocumentCompleted +=
            new WebBrowserDocumentCompletedEventHandler(
                wb_DocumentCompleted);
    
        while (wb.ReadyState != WebBrowserReadyState.Complete)
            Application.DoEvents();
        wb.Dispose();
    }
    
    private void wb_DocumentCompleted(object sender,
        WebBrowserDocumentCompletedEventArgs e)
    {
        if(((WebBrowser)sender).Document.Body != null)
        {
    
            GeneratedSource = ((WebBrowser)sender).Document.Body.InnerHtml;
        }
        else
        {
    //Handle when Document isn't fully loaded
        }
    }
    }
    
TaylorM
  • 109
  • 1
  • 11
  • Don't use `DoEvents`. Check [this code](http://stackoverflow.com/a/22262976/1768303) for some fresh ideas. – noseratio Jun 23 '15 at 19:41

1 Answers1

1

Following links may help in providing some information in resolving the issue at hand, as it looks Application.DoEvents is not a good usage, there is good amount of discussion regarding it and its replacement:

Use of Application.DoEvents()

Alternative to Application.DoEvents()

Do Events Evil

My understanding is Document.Body that you are finding as null, DoEvents has a role to play.

Regarding the Join(), that's it's role it will block till the point thread returns, I am not sure why are you using STA as thread property, that is required for accessing something like COM, which can only operate in STA mode. You may want to check following links of doing the same using Async-Await system, which is much better in terms of making UI thread free and will make your UI interface far more responsive:

is there an Application.DoEvents() for WebBrowser?

Ideally nowadays using threads is obsolete, preferably use Task APIs, as they do a much better job in terms of parallelization.

Community
  • 1
  • 1
Mrinal Kamboj
  • 11,300
  • 5
  • 40
  • 74
  • Thank you for your response. I'm currently using .NET 4.5 and have never used any of the Task APIs. I've looked at the articles linked and noticed 'AutoResetEvent' to be a possible solution to replace the 'Application.DoEvents()' – TaylorM Jun 23 '15 at 13:49
  • AutoResetEvent is also a siganling mechanism, which is again old implementation .Net 4.5 includes Async-Await and Tasks to do the work either in asynchronous mode or on parallel threadpool threads, they are easy to use with very less code – Mrinal Kamboj Jun 23 '15 at 18:03
  • I've begun implementing the Async-Await and Tasks but am having difficulties. – TaylorM Jun 23 '15 at 18:46
  • Suggest the issues, I can help, but this is the way to go – Mrinal Kamboj Jun 24 '15 at 07:19