54

I've got an async method, GetExpensiveThing(), which performs some expensive I/O work. This is how I am using it:

// Serial execution
public async Task<List<Thing>> GetThings()
{
    var first = await GetExpensiveThing();
    var second = await GetExpensiveThing();
    return new List<Thing>() { first, second };
}

But since it's an expensive method, I want to execute these calls in in parallel. I would have thought moving the awaits would have solved this:

// Serial execution
public async Task<List<Thing>> GetThings()
{
    var first = GetExpensiveThing();
    var second = GetExpensiveThing();
    return new List<Thing>() { await first, await second };
}

That didn't work, so I wrapped them in some tasks and this works:

// Parallel execution
public async Task<List<Thing>> GetThings()
{
    var first = Task.Run(() =>
    {
        return GetExpensiveThing();
    });

    var second = Task.Run(() =>
    {
        return GetExpensiveThing();
    });

    return new List<Thing>() { first.Result, second.Result };
}

I even tried playing around with awaits and async in and around the tasks, but it got really confusing and I had no luck.

Is there a better to run async methods in parallel, or are tasks a good approach?

Dave New
  • 38,496
  • 59
  • 215
  • 394
  • 3
    @bside You're misrepresenting and the linked post. It correctly states that continuations in `async` functions are scheduled on the captured context. But in most cases that context is the default `SynchronizationContext`, which schedules continuations onto the thread pool, causing them to run in parallel. And even in WPF and ASP apps you can work around it with `ConfigureAwait(false)`. `Task.Run` is used with CPU bound tasks and you don't need it to run continuations in parallel. – V0ldek Mar 01 '20 at 15:15
  • @V0ldek Yes, you are right. Thank you for drawing my attention to this. – Andrii Viazovskyi Mar 24 '20 at 17:28

4 Answers4

54

Is there a better to run async methods in parallel, or are tasks a good approach?

Yes, the "best" approach is to utilize the Task.WhenAll method. However, your second approach should have ran in parallel. I have created a .NET Fiddle, this should help shed some light. Your second approach should actually be running in parallel. My fiddle proves this!

Consider the following:

public Task<Thing[]> GetThingsAsync()
{
    var first = GetExpensiveThingAsync();
    var second = GetExpensiveThingAsync();

    return Task.WhenAll(first, second);
}

Note

It is preferred to use the "Async" suffix, instead of GetThings and GetExpensiveThing - we should have GetThingsAsync and GetExpensiveThingAsync respectively - source.

Community
  • 1
  • 1
David Pine
  • 23,787
  • 10
  • 79
  • 107
  • 3
    `await Task.WhenAll` will return `Thing[]`, so there's no need for `Result` (in fact, `Result` will wrap exceptions; you should use `await` or use the result of `await Task.WhenAll`). – Stephen Cleary Jul 28 '16 at 11:10
  • 2
    Nice catch, thank you - it is early and I've been "awaiting" my coffee. :) – David Pine Jul 28 '16 at 11:12
  • 1
    Whilst this is the generally correct way, what is this doing that `return new List() { await first, await second };` is not? If OP said that didn't work, there must be something else at play... – yaakov Jul 28 '16 at 11:13
  • 1
    This doesn't work for me. These methods are still executing sequentially. I can only seem to achieve what I want using those tasks. Would the inner implementation of GetExpensiveThing(), an async method itself, have any effect on this? – Dave New Jul 28 '16 at 11:17
  • @davenewza what does the implementation of `GetExpensiveThing` look like? – David Pine Jul 28 '16 at 11:18
  • Quite right. GetExpensiveThing() had an operation which wasn't awaited. The second approach work now. – Dave New Jul 28 '16 at 11:38
  • Based on this answer, I use `var tasks = new Task[] { GetExpensiveThingAsync(), GetAnotherExpensiveThingAsync(), }; await Task.WhenAll(tasks);` – Nae Dec 31 '20 at 15:30
  • If it's a CPU bound expensive, don't use async. But instead, `Task.Run` – Glitch Nov 27 '22 at 03:03
35

Task.WhenAll() has a tendency to become unperformant with large scale/amount of tasks firing simultaneously - without moderation/throttling.

If you are doing a lot of tasks in a list and wanting to await the final outcome, then I propose using a partition with a limit on the degree of parallelism.

I have modified Stephen Toub's blog elegant approach to modern LINQ:

public static Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, Task> funcBody, int maxDoP = 4)
{
    async Task AwaitPartition(IEnumerator<T> partition)
    {
        using (partition)
        {
            while (partition.MoveNext())
            {
                 await Task.Yield(); // prevents a sync/hot thread hangup
                 await funcBody(partition.Current);
            }
        }
    }

    return Task.WhenAll(
        Partitioner
            .Create(source)
            .GetPartitions(maxDoP)
            .AsParallel()
            .Select(p => AwaitPartition(p)));
}

How it works is simple, take an IEnumerable - dissect it into evenish partitions and the fire a function/method against each element, in each partition, at the same time. No more than one element in each partition at anyone time, but n Tasks in n partitions.

Extension Usage:

await myList.ParallelForEachAsync(myFunc, Environment.ProcessorCount);

Edit: I now keep some overloads in a repository on Github if you need more options. It's in a NuGet too for NetStandard.

Edit 2: Thanks to comments from Theodor below, I was able to mitigate poorly written Async Tasks from blocking parallelism by using await Task.Yield();.

HouseCat
  • 1,559
  • 20
  • 22
  • 1
    Nice! This is perfect is you want to use Stephen Toub's approach but prefer to use method-syntax LINQ. – David Tarulli Dec 06 '18 at 19:57
  • I am glad you like it, that's exactly what I was going for :) – HouseCat Dec 09 '18 at 14:41
  • 1
    The `AsParallel` is redundant. The `Select` that follows is doing no CPU-intensive work. Also by removing the `Task.Run` from Stephen Toub's code you risk a reduced degree of parallelism in case the method `funcBody` is CPU-intensive and completes synchronously. – Theodor Zoulias Mar 01 '20 at 12:46
  • AsParallel takes care of Task.Run and Select combined @TheodorZoulias – Dan Hunex Apr 29 '20 at 19:19
  • @DanHunex replacing the `Task.Run` with `AsParallel` introduces artificial limits to the degree of parallelism, because the configuration `WithDegreeOfParallelism` is missing, so the default `Environment.ProcessorCount` is used. Take a look at [this](https://dotnetfiddle.net/J4epIE) fiddle, where the requested `maxDoP: 10` is not respected. Then comment the line `.ParallelForEachAsync_HouseCat` and uncomment the line `//.ParallelForEachAsync_StephenToub`, and see that now the requested DOP is respected. – Theodor Zoulias Apr 30 '20 at 02:51
  • I agree with Dan Hunex, but I will re-test the quality of the code implementation. – HouseCat May 01 '20 at 14:51
  • Theodor, you are wrong about the parallelism but right about the behavior. The AsParallel() does in fact replace Task.Run(() =>());, but your demonstration showed that a heavy duty synchronous /hot task could invoke execution and hold up the next element. That being said Task.Run(() => will quickly drain from your thread pool and is unnecessary to solve this behavior. I would recommend adding `await Task.Yield();` to my code to prevent these situations. Try it in your fiddle above that very gross Thread.Sleep (which you shouldn't do in async land and use await Task.Delay()). – HouseCat Jul 27 '20 at 23:43
  • @HouseCat by combining `AsParallel`+`Task.Yield` you are close to `Task.Run`, but still the behavior is not identical. The `AsParallel` includes the current thread as one of the worker threads, so if you run your implementation in a UI application with a badly behaving `funcBody`, you'll get a freezed UI. Which is much worse than a saturated thread pool, which will happen anyway with both approaches if the `funcBody` is not implemented asynchronously. With well behaving `funcBody`, no saturation is going to happen with either approach. The `Task.Run` is just simpler and better for this job. – Theodor Zoulias Jul 28 '20 at 10:18
  • @HouseCat I should also note that both yours and Stephen Toub's original implementation of `ParallelForEachAsync` are flawed. In case of exceptions, one worker is killed on every exception, and the process continues with a reduced degree of parallelism. If you are unlucky to have exactly `maxDoP - 1` early exceptions, the last standing worker will slowly process all elements alone, until the exceptions are finally surfaced. A less naive implementation should make sure that an exception on any worker will terminate the whole process as fast as possible. – Theodor Zoulias Jul 28 '20 at 10:29
  • @TheodorZoulias Ah yes I agree with your assessments, but if you need fault tolerance and were worried about receiving nefarious types of Exception or Synchronization lock type tasks, I would consider a proper TPL solution. DataflowEngine for example or my Pipelines engine as an idea. https://github.com/houseofcat/RabbitMQ.Core/blob/master/CookedRabbit.Core/WorkEngines/DataflowEngine.cs https://github.com/houseofcat/RabbitMQ.Core/blob/master/CookedRabbit.Core/WorkEngines/Pipeline.cs You really know your stuff! I would love your inputs on these designs if you get a moment ^.^ – HouseCat Jul 28 '20 at 15:29
  • I am not very familiar with GitHub. Does it allow having discussions, or simply writing comments related to projects or code-files? – Theodor Zoulias Jul 28 '20 at 17:42
  • You can "fork" a copy of anything on Github to your account, edit comments, and submitted them to the main repo if you find issues. Otherwise you can open up "issues" as labelled discussion similar to a forum feature here. – HouseCat Jul 28 '20 at 22:19
3

You can your the Task.WhenAll, which returns when all depending tasks are done

Check this question here for reference

Community
  • 1
  • 1
Benjamin Soulier
  • 2,223
  • 1
  • 18
  • 30
3

If GetExpensiveThing is properly asynchronous (meaning it doesn't do any IO or CPU work synchronously), your second solution of invoking both methods and then awaiting the results should've worked. You could've also used Task.WhenAll.

However, if it isn't, you may get better results by posting each task to the thread-pool and using the Task.WhenAll combinator, e.g.:

public Task<IList<Thing>> GetThings() =>
    Task.WhenAll(Task.Run(() => GetExpensiveThing()), Task.Run(() => GetExpensiveThing()));

(Note I changed the return type to IList to avoid awaits altogether.)

You should avoid using the Result property. It causes the caller thread to block and wait for the task to complete, unlike await or Task.WhenAll which use continuations.

Eli Arbel
  • 22,391
  • 3
  • 45
  • 71