0

Trying to wrap my head around TaskCompletionSource. Here is a little class I wrote to synchronously (WebBrowser.Navigate() is async) download a webpage and return it to the caller. I'm not sure if I have used TaskCompletionSource correctly. Can someone please indicate what I'm missing here, or if this is entirely an over-engineered solution?

class PageDownloader
{
  private WebBrowser _WB = new WebBrowser();
  private TaskCompletionSource<bool> tcs = new TaskCompletionSource<bool>();

  public PageDownloader()
  {
    _WB.LoadCompleted += _WB_LoadCompleted;
  }

  public string Download(string url)
  {
    _WB.Navigate(new Uri(url));

    tcs.Task.Wait();

    if (tcs.Task.IsCanceled || tcs.Task.IsFaulted)
      return null;
    else
      return (_WB.Document as mshtml.HTMLDocument).body.innerHTML;
  }

  private void _WB_LoadCompleted(object sender, System.Windows.Navigation.NavigationEventArgs e)
  {
    var docTemp = _WB.Document as mshtml.HTMLDocument;
    foreach (mshtml.IHTMLImgElement imgElemt in docTemp.images)
      imgElemt.src = "";

    tcs.SetResult(true);
  }
}
dotNET
  • 33,414
  • 24
  • 162
  • 251
  • Shouldn't the Download method be async so you can actually await it? – mm8 Aug 29 '17 at 13:34
  • @mm8: No. That's the whole point of the class. I'm trying to create a "sync" version of it. – dotNET Aug 29 '17 at 13:35
  • Then you might as well handle the LoadCompleted event of the WebBrowser Control directly, can't you? – mm8 Aug 29 '17 at 13:37
  • @mm8: Yeah and I actually am handling it above. The only problem is to somehow hold the code immediately after the `Navigate()` line and let it through only after the `LoadCompleted` event handler has completed (no puns). I was hoping to use `TaskCompletionSource` for that purpose. – dotNET Aug 29 '17 at 13:52
  • You can't "hold" when calling Navigate() since this method doesn't return anything. That's why you should await your Download method. See my answer. – mm8 Aug 29 '17 at 13:59

2 Answers2

1

I am not sure what you mean by async here but the WebBrowser.Navigate method simply returns void and cannot be awaited using the async/await keywords that were introduced in C#5. It kicks of a navigation operation and returns immediately and you should subscribe to the LoadCompleted event handler if you want to do something once the navigation has actually completed. So far so good.

Using a TaskCompletionSource<T> you could actually make the Download method in your class async so you can actually await the result. You probably also want to to catch any exception that may occur in your LoadCompleted event handler:

class PageDownloader
{
    private WebBrowser _WB = new WebBrowser();
    private TaskCompletionSource<string> tcs = new TaskCompletionSource<string>();

    public PageDownloader()
    {
        _WB.LoadCompleted += _WB_LoadCompleted;
    }

    public async Task<string> DownloadAsync(string url)
    {
        _WB.Navigate(new Uri(url));

        await tcs.Task.ConfigureAwait(false);

        if (tcs.Task.IsCanceled || tcs.Task.IsFaulted)
            return null;
        else
            return tcs.Task.Result;
    }

    private void _WB_LoadCompleted(object sender, System.Windows.Navigation.NavigationEventArgs e)
    {
        try
        {
            var docTemp = _WB.Document as mshtml.HTMLDocument;
            foreach (mshtml.IHTMLImgElement imgElemt in docTemp.images)
                imgElemt.src = "";

            tcs.SetResult(docTemp.body.innerHTML);
        }
        catch(Exception ex)
        {
            tcs.SetException(ex);
        }
    }
}

Usage:

PageDownloader downloader = new PageDownloader();
string html = await downloader.DownloadAsync("http://stackoverflow.com");
//or if you want to block synchronously
string html = downloader.DownloadAsync("http://stackoverflow.com").Result;

You could perhaps also create a sync overload in your class:

public string Download(string url)
{
    return DownloadAsync(url).Result;
}
mm8
  • 163,881
  • 10
  • 57
  • 88
  • Maybe I was too quick to click. Your code hits the `await` line but never move forward from there. Can u give it a quick test? – dotNET Aug 29 '17 at 14:27
  • Which await line? Does the SetResult method get called? – mm8 Aug 29 '17 at 14:28
  • The WebBrowser may have to be added to the visual tree for the LoadCompleted event to get fired. So you could inject your class with a WebBroswer control that you have defined in your XAML rather than creating a new one that is never shown. But this has nothing to do with TaskCompletionSource. – mm8 Aug 29 '17 at 14:40
  • No. `SetResult` doesn't get called. It never enters `_WB_LoadCompleted`. I'll try with a XAML-defined `WebBrowser` and get back. – dotNET Aug 29 '17 at 14:57
  • It doesn't even work with a WebBrowser control that is added directly in XAML. – dotNET Aug 29 '17 at 15:33
  • Yes, provided that you use a WebBrowser control that is actually displayed on the screen. It doesn't make much sense to use a WebBrowser control if you don't intend to display the web page in your application. – mm8 Aug 30 '17 at 09:51
  • It actually does. I'm doing some scraping using `WebBrowser`. I know there are `WebClient` and other such classes available for downloading contents from URLs, but somehow web servers detect `WebClient` as bots, while they treat `WebBrowser` requests as normal users, so I was forced to use it. – dotNET Aug 30 '17 at 10:46
  • Well, that's another story that has nothing to do with your original question really. – mm8 Aug 30 '17 at 10:51
  • No other control in WinForms or WPF that I know of forces users to actually display it on the screen for it to work correctly. Also since the original question explicitly creates the `WebBrowser` control inside the non-UI class, there is no reason to believe that the control is visible on the screen. Nonetheless, this should be explicitly documented by MS that `WebBrowser` won't work until it is seen (reminds me of [double-slit experiment](https://www.youtube.com/watch?v=DfPeprQ7oGc) really!). Anyway, your continuous input has lead me to create a working solution, so a bundle of thanks. – dotNET Aug 30 '17 at 11:45
  • @dotNET, check if [this](https://stackoverflow.com/a/22262976/1768303) is what you're looking for – noseratio Aug 30 '17 at 19:47
  • @Noseratio: Thanks a bunch. That looks promising. – dotNET Aug 31 '17 at 03:30
0

The only solution that has worked for me till now is using Windows Forms WebBrowser control with the following code:

public string Download(string url)
{
  bool flag = false;

  using (System.Windows.Forms.WebBrowser WB = new System.Windows.Forms.WebBrowser())
  {
    WB.DocumentCompleted += (sender, e) =>
    {
      var docTemp = WB.Document;
      foreach (HtmlElement imgElemt in docTemp.Images)
        imgElemt.SetAttribute("src", "");

      flag = true;
    };

    WB.Navigate(url);

    while (!flag)
      Application.DoEvents();

    return WB.Document.Body.InnerHtml;
  }
}

Even if I simply replace the WinForms version with WPF WebBrowser control in this code, this code won't ever hit the return line.

Looking for a better answer.

dotNET
  • 33,414
  • 24
  • 162
  • 251