4

I've read the following closely related thread, but I'd like to ask about a more specific thing.

If we need to run Tasks/methods asynchronously, and those tasks themselves run other tasks/await other tasks, which variant is preferred - Parallel.ForEach, or Task.WhenAll? I will demonstrate with some code below:

public async Task SomeWorker(string param1, HttpClient client,
    List<FillMeUp> emptyCollection)
{
    HttpRequestMessage message = new HttpRequestMessage();
    message.Method = HttpMethod.Get;
    message.Headers.Add("someParam", param1);
    message.RequestUri = new Uri("https://www.somesite.me");
    var requestResponse = await client.SendAsync(message).ConfigureAwait(false);
    var content = await requestResponse.Content.ReadAsStringAsync()
        .ConfigureAwait(false);
    emptyCollection.Add(new FillMeUp()
    {
        Param1 = param1
    });
}

Used with WhenAll:

using (HttpClient client = new HttpClient())
{
    client.DefaultRequestHeaders.Add("Accept", "application/json");

    List<FullCollection> fullCollection = GetMyFullCollection();
    List<FillMeUp> emptyCollection = new List<FillMeUp>();
    List<Task> workers = new List<Task>();
    for (int i = 0; i < fullCollection.Count; i++)
    {
        workers.Add(SomeWorker(fullCollection[i].SomeParam, client,
            emptyCollection));
    }

    await Task.WhenAll(workers).ConfigureAwait(false);

    // Do something below with already completed tasks
}

Or, all of the above written in a Parallel.ForEach():

using (HttpClient client = new HttpClient())
{
    client.DefaultRequestHeaders.Add("Accept", "application/json");

    List<FullCollection> fullCollection = GetMyFullCollection();
    List<FillMeUp> emptyCollection = new List<FillMeUp>();
    Parallel.ForEach<FullCollection>(fullCollection, (fullObject) =>
    {
       HttpRequestMessage message = new HttpRequestMessage();
       message.Method = HttpMethod.Get;
       message.Headers.Add("someParam", fullObject.SomeParam);
       message.RequestUri = new Uri("https://www.somesite.me");
       var requestResponse = client.SendAsync(message)
           .GetAwaiter().GetResult();
       var content = requestResponse.Content.ReadAsStringAsync()
           .GetAwaiter().GetResult();
       emptyCollection.Add(new FillMeUp()
       {
          Param1 = fullObject.SomeParam
       });
    });
}

I'm aware that Lists are not thread safe. It's just something to demonstrate the nature of my question.

Both methods of HttpClient (SendAsync and ReadAsStringAsync) are asynchronous, and as such must be called synchronously in order to work with Parallel.ForEach.

Is that preferred over the Task.WhenAll route? I've tried various performance tests, and I can't seem to find a difference.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
SpiritBob
  • 2,355
  • 3
  • 24
  • 62
  • 6
    The fact you had to write `GetAwaiter().GetResult()` inside `Parallel.ForEach` shows it's not meant for async operations. `Parallel.ForEach` is meant for data parallelism, ie processing large amounts of data, eg an array with 1M items. – Panagiotis Kanavos Oct 07 '19 at 15:40
  • @PanagiotisKanavos Yeah, those should definitely be used in an async method, thank you for the insight! – SpiritBob Oct 07 '19 at 15:42

2 Answers2

7

In general you need the Parallel class when you have loads of work for your CPU to do. You need the Task class when you have to wait for loads of work that the external world will do for you.

  • Parallel = Calculations. (CPU-bound)
  • Task = Waiting for web servers, file systems and databases to respond. (I/O-bound)

Starting from .NET 6, the Parallel is now equipped with with the asynchronous ForEachAsync method, which accepts an asynchronous body delegate, and supports the same options as the synchronous Parallel methods. The most important option is the MaxDegreeOfParallelism, because web servers, file systems and databases all perform miserably when overflowed by requests. Prior to this we had to resort to advanced tools like the TPL Dataflow library, or write custom ForEachAsync implementations as shown in this question.

It should be noted that Tasks can also be used for CPU-bound work. The Parallel class uses Tasks internally as building blocks. This relationship is apparent by the umbrella term Task Parallel Library (TPL).

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
2

I don't think the main consideration here is performance. (It always is :-) but read on - using the correct construct in the correct case will guarantee you the best performance)

Think of Parallel.ForEach as a special ForEach which is parallelizing the individual (synchronous) tasks. While you could shove already asynchronous operations in it (by blocking), it seems contrived and misused - you will lose the async/await benefits of each tasks by doing so. The only "benefit" that you get out of it is that its behavior from the stand point of view of your code flow is synchronous - it will not complete until all threads it spawned return.

Since your individual tasks are already async, it is the latest feature of the Parallel.ForEach that Task.WhenAll gives you.

G. Stoynev
  • 7,389
  • 6
  • 38
  • 49