10

I've switched over to .net Core for some projects and am now having a problem with Parallel.ForEach. In the past I often had a List of id values which I then would use to make web requests in order to get the full data. It would look something like this:

Parallel.ForEach(myList, l =>
{
    // make web request using l.id 
    // process the data somehow
});

Well, in .net Core the web requests must all be tagged await which means the Parallel.ForEach action must be tagged with async. But, tagging a Parallel.ForEach action as async means we have a void async method which causes issues. In my particular case that means the response returns to the application before all of the web requests in the Parallel loop are completed which is both awkward and causes errors.

Question: What are the alternatives to using Parallel.ForEach here?

One possible solution I found was to wrap the Parallel loop inside of a Task and await the task:

await Task.Run(() => Parallel.ForEach(myList, l =>
{
    // stuff here
}));

(found here:Parallel.ForEach vs Task.Run and Task.WhenAll)

But, that isn't working for me. When I use that I still end up returning to the application before the loop is completed.

Another option:

var tasks = new List<Task>();
foreach (var l in myList)
{
    tasks.Add(Task.Run(async () =>
    {
         // stuff here
    }));
}
await Task.WhenAll(tasks);

This appears to work, but is that the only option? It seems that the new .net Core has rendered Parallel.ForEach virtually useless (at least when it comes to nested web calls).

Any assistance/advice is appreciated.

Community
  • 1
  • 1
nurdyguy
  • 2,876
  • 3
  • 25
  • 32
  • 9
    `async/await` was designed for long and blocking **I/O operations**, while `Parallel` is designed for long and blocking **CPU operations**. If you find yourself trying to write asynchronous code inside a `Parallel` function body, then you are doing something wrong. Consider using [Task.WhenAll](https://msdn.microsoft.com/en-us/library/system.threading.tasks.task.whenall(v=vs.110).aspx) instead. – Matias Cicero Sep 30 '16 at 15:03
  • 3
    In addition to above comment, when you do Task.Run(async() => ...), you also almost always doing something wrong. – Evk Sep 30 '16 at 15:05
  • You should look into [TPL Dataflow](https://msdn.microsoft.com/en-us/library/hh228603(v=vs.110).aspx). Makes your life a lot easier. It's not a part of the .NET Framework but you can use nuget to get it, – ThePerplexedOne Sep 30 '16 at 15:05
  • @MatiasCicero thank you for explaining the difference between the intended use, it was very enlightening. – nurdyguy Sep 30 '16 at 15:27
  • @Evk If that is the wrong way, can you point me in the direction of the correct way? – nurdyguy Sep 30 '16 at 15:27

4 Answers4

29

Why Parallel.ForEach is not good for this task is explained in comments: it's designed for CPU-bound (CPU-intensive) tasks. If you use it for IO-bound operations (like making web requests) - you will waste thread pool thread blocked while waiting for response, for nothing good. It's possible to use it still, but it's not best for this scenario.

What you need is to use asynchronous web request methods (like HttpWebRequest.GetResponseAsync), but here comes another problem - you don't want to execute all your web requests at once (like another answer suggests). There may be thousands urls (ids) in your list. So you can use thread synchronization constructs designed for that, for example Semaphore. Semaphore is like queue - it allows X threads to pass, and the rest should wait until one of busy threads will finish it's work (a bit simplified description). Here is an example:

static async Task ProcessUrls(string[] urls) {
    var tasks = new List<Task>();
    // semaphore, allow to run 10 tasks in parallel
    using (var semaphore = new SemaphoreSlim(10)) {
        foreach (var url in urls) {
            // await here until there is a room for this task
            await semaphore.WaitAsync();
            tasks.Add(MakeRequest(semaphore, url));
        }
        // await for the rest of tasks to complete
        await Task.WhenAll(tasks);
    }
}

private static async Task MakeRequest(SemaphoreSlim semaphore, string url) {
    try {
        var request = (HttpWebRequest) WebRequest.Create(url);

        using (var response = await request.GetResponseAsync().ConfigureAwait(false)) {
            // do something with response    
        }
    }
    catch (Exception ex) {
        // do something
    }
    finally {
        // don't forget to release
        semaphore.Release();
    }
}
Gianlucca
  • 1,334
  • 2
  • 14
  • 26
Evk
  • 98,527
  • 8
  • 141
  • 191
  • Thank you for this. I don't think that is the way we are going to go right now but it is something to keep in mind for the future. – nurdyguy Sep 30 '16 at 18:59
15

Neither of those 3 apporaches are good.

You should not use the Parallel class, or Task.Run on this scenario.

Instead, have an async handler method:

private async Task HandleResponse(Task<HttpResponseMessage> gettingResponse)
{
     HttpResponseMessage response = await gettingResponse;
     // Process the data
}

And then use Task.WhenAll:

Task[] requests = myList.Select(l => SendWebRequest(l.Id))
                        .Select(r => HandleResponse(r))
                        .ToArray();

await Task.WhenAll(requests);
Matias Cicero
  • 25,439
  • 13
  • 82
  • 154
1

You should call the methods using the ref keyword to get the job done and that should do the trick with minimum effort. This approach worked well for me in a similar situation.

Parallel.ForEach(myList, l =>
{

    // make web request using ref l.id 
    string id=l.id;
    WebRequest webRequest= MakeRequest(ref id);
    // process the data somehow
});

private WebRequest MakeRequest(ref string id)
{
  //make and return web request
}
  • but if you have thousands of ids you will wind up with thousands of requests so you limit by specifying `MaxDegreeOfParallelism` – Mbithy Mbithy May 08 '19 at 08:55
  • Note `Parallel.ForEach` won't try to execute the action for all items in the list, it will manage the batch size for you, `MaxDegreeOfParallelism` allows you to specify the maximum number of concurrent threads to use. In most cases you should not need to specify the max, but after observation of your code at runtime you may find that you want to limit the concurrency to allow other (perhaps parallel or async) processes to execute with higher priority. – Chris Schaller Jul 26 '19 at 02:01
-2

I assume this code will work:

for (int i = 0; i < myList.Length; i++)
{
    var item = myList[i];

    var msg = await SendAsync(item.Id);
    //Post Process
}
Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
Mani
  • 335
  • 2
  • 10