-1

My objective is to load multiple links at the same time and create a task for each of them.

The task will call an async method that will parse the links and returns sublinks, which in return will be parsed (using WebBrowser) and then they return a download link.

The first async method will call 2 subsequent methods for that work to be done.

My problem is Task.Factory.ContinueWhenAll would return only when the all first method finish, and won't wait for the rest of the work to be done. I only want to continue when I have all download links ready which may need multiple webpage parsings before they are.

Currently my code is the following:

var tasks = new List<Task>();
for (var index = 0; index < items_checklist.CheckedItems.Count; index++)
{
    var item = items_checklist.CheckedItems[index];
    Task task = Task.Factory.StartNew(
                    () => GetMirrors(((Item) item).Value, ((Item) item).Text)
                    , CancellationToken.None
                    , TaskCreationOptions.None
                    , TaskScheduler.FromCurrentSynchronizationContext()
    );
    tasks.Add(task);
 }

Task.Factory.ContinueWhenAll(tasks.ToArray(), GetLinks_Finished =>
{
    SetLinksButtonText(@"Links Ready");
    SetLinksButtonState(false);
    SetDownloadButtonState(true);
    Cursor.Current = DefaultCursor;
});

This will return when all GetMirrors finish but GetMirrors would call "tempbrowser_DocumentCompleted" (WebBrowser complete event) which in turn would call "LoadLinkIntoQueue" to load the download link into the queue.

I want ContinueWhenAll to resume when all LoadLinkIntoQueue are executed.

What is my logic missing?

Elie-M
  • 79
  • 1
  • 11
  • I do not know if there is support for this. If not, you have to make a counter of how many threads are stil running. Add a continuitation to all final tasks that just decrements said counter. And fires whatever final logic it should after the final final task. – Christopher Dec 28 '17 at 16:25
  • 1
    This sounds like a job for [Dataflow](https://blog.stephencleary.com/2012/09/introduction-to-dataflow-part-1.html) or [RX.Net](http://reactivex.io/intro.html) – JSteward Dec 28 '17 at 16:42
  • 1
    You can do this without dataflow or rx.net. Post the code for the GetMirrors method. – Sievajet Dec 28 '17 at 18:44

1 Answers1

2

You can create a TaskCompletionSource in your GetMirrors method, which is the method used in the Task.Factory.StartNew call inside your for loop of urls to process.

In GetMirrors you would hook up the DocumentCompleted event of a new WebBrowser which will call the SetResult on the TaskCompletionSource causoing the task to transition to Completed.

Your implementation would be like this:

Task<string> GetMirrors(string url, string somethingelse )
{

    // this will signal that the Task is completed
    // we want the parent to wait
    var tcs = new TaskCompletionSource<string>(TaskCreationOptions.AttachedToParent);

    // give each task their own WebBrowser instance
    WebBrowser tempbrowser = new WebBrowser();
    tempbrowser.ScriptErrorsSuppressed = true;
    this.Controls.Add(tempbrowser);

    tempbrowser.DocumentCompleted += (s, e) => {
        // LoadLinkIntoQueue call 
        // we have a result so signal to the CompletionSource that we're done
        tcs.SetResult(e.Url.ToString());

        this.Controls.Remove(tempbrowser);
    };

    // hook up errorhandling if you need that, left as an exercise.

    tempbrowser.Navigate(url);
    // we return the Task from the completion source
    return tcs.Task ; 
}

You can also call SetException on the TaskCompletionSource instance if you want to return exceptions that occur.

Notice that in this code I instantiate a WebBrowser for each task, so you don't have to worry about serializing the tasks to only have a single WebBrowser control handle a task.

rene
  • 41,474
  • 78
  • 114
  • 152
  • 1
    @Sievajet yes, this is my guess at their implementation of the method that sits in the loop that the OP shows which does create all the tasks. I didn't want to add all that boilerplate again. But you're right, this code will not work for multiple URL's, it does work in context with the code of the OP with multiple urls, at least in my testing it did. – rene Dec 28 '17 at 20:22
  • Initial implementation and testing are showing that this works and nearly as intended. I need to make more tests, with debug output, if it works I will update my answer and make it the accepted answer, thanks. – Elie-M Dec 28 '17 at 21:49
  • This is doing exactly what I wanted. I also wanted a single WebBrowser per thread so also this complies to my requirement. Though now the DownloadCompleted event is triggering twice per browser and I'm having 'System.InvalidOperationException' in mscorlib.dll and 'System.Reflection.TargetInvocationException' in mscorlib.dll, I can work on those on my own. I will learn the code, though it would be beneficial for me to tell me what was exactly my mistake/mishap. Thanks. – Elie-M Dec 28 '17 at 22:28
  • 1
    @Elie-M the double trigger is maybe caused by a redirect response or a script that sets a new document url. I didn't see that in my testing but I used stackexchange sites as the urls. As for the Targetinvocation exception: I assumed the code you showed was called from a Winforms click or load event. In that case your option ` TaskScheduler.FromCurrentSynchronizationContext()` should prevent those exceptions. What type of call is causing that exception? – rene Dec 28 '17 at 22:40
  • I solved them all, it turns out that there's another link (a popup mostly I think) loading with the intended original link. I added a check for that and it solved all the problems (double event and the exceptions). Thank you. Can you tell me what was my original mistake? – Elie-M Dec 28 '17 at 22:44
  • 1
    The Tasks simply run and don't pay attention to any of the DocumentCompleted events. When the WebBrowser component was created `Tasks` didn't exist so the WebBrowser has no way of cooperating with the async/await work you expect it to. All the webbrowser knows it has gotten a thread and it is allowed to run. The TaskCompletionSource is the trick that brings those non-task aware components into line with the new and modern async/await, Task based pattern. So if anything, that was the original oversight. – rene Dec 28 '17 at 23:00