0

I've got a loop that needs to be run in parallel as each iteration is slow and processor intensive but I also need to call an async method as part of each iteration in the loop.

I've seen questions on how to handle an async method in the loop but not a combination of async and synchronous, which is what I've got.

My (simplified) code is as follows - I know this won't work properly due to the async action being passed to foreach.

protected IDictionary<int, ReportData> GetReportData()
{
    var results = new ConcurrentDictionary<int, ReportData>();
      
    Parallel.ForEach(requestData, async data =>
    {
        // process data synchronously
        var processedData = ProcessData(data);

        // get some data async
        var reportRequest = await BuildRequestAsync(processedData);

        // synchronous building
        var report = reportRequest.BuildReport();

        results.TryAdd(data.ReportId, report);
     });

     // This needs to be populated before returning
     return results;
}

Is there any way to get execute the action in parallel when the action has to be async in order to await the single async call.

It's not a practical option to convert the synchronous functions to async.

I don't want to split the action up and have a Parallel.ForEach followed by the async calls with a WhenAll and another Parallel.ForEach as the speed of each stage can vary greatly between different iterations so splitting it would be inefficient as the faster ones would be waiting for the slower ones before continuing.

I did wonder if a PLINQ ForAll could be used instead of the Parallel.ForEach but have never used PLINQ and not sure if it would wait for all of the iterations to be completed before returning, i.e. would the Tasks still be running at the end of the process.

Mog0
  • 1,689
  • 1
  • 16
  • 40
  • 1
    [Async/await and the Task Parallel Library don't mix](https://stebet.net/asyncawait-and-the-task-parallel-library-dont-mix/) Also see https://stackoverflow.com/a/23139769/982149 – Fildor Jan 26 '21 at 11:12
  • 1
    Does this answer your question? [Parallel.ForEach and async-await](https://stackoverflow.com/questions/23137393/parallel-foreach-and-async-await) – Fildor Jan 26 '21 at 11:14
  • 1
    The `Parallel.ForEach` [is not async-friendly](https://stackoverflow.com/questions/15136542/parallel-foreach-with-asynchronous-lambda), and neither is PLINQ. AFAIK the ideal tool for processing mixed sync-async workloads is the TPL Dataflow library. You can see an example [here](https://stackoverflow.com/questions/62602684/c-sharp-process-files-concurrently-and-asynchronously/62613098#62613098). Bear in mind that the TPL Dataflow has a -smallish- learning curve. If you don't have time for that, you can just pack the `ThreadPool` with lots of threads, and process everything synchronously. – Theodor Zoulias Jan 26 '21 at 11:25
  • Can you convert the async call to a synchronous one? using async/await is mostly for hiding latency of IO operations, when running in parallel that might not be usefull. – JonasH Jan 26 '21 at 12:05
  • @JonasH -while it is true that the async is not useful here the same method is used elsewhere and the async is useful there; the method ultimately calls a library method that is async only so would not be simple to change it. – Mog0 Jan 26 '21 at 13:30
  • Converting it to synchronous is as simple as `var reportRequest = BuildRequestAsync(processedData).GetAwaiter().GetResult();`. – Theodor Zoulias Jan 26 '21 at 13:37
  • @TheodorZoulias - Sadly, converting async to sync is not that simple. That will work in some scenarios but is pretty much a recipe for deadlocks and not recommended. There's lots of discussion of calling synchronous code to async and it's generally a bad idea, except for some edge cases. – Mog0 Jan 26 '21 at 13:42
  • Mog0 yes, blocking on asynchronous code is susceptible to deadlocks. But if this happens, the deadlock will show itself immediately, and will demand your attention. There are multiple workarounds available to prevent the deadlock from happening. Surely it is far from ideal, mostly because it causes threads to be blocked and RAM to be wasted. It's a low tech solution. Btw the .NET 6 will come with a much needed [`Parallel.ForEachAsync`](https://github.com/dotnet/runtime/issues/1946) new API. – Theodor Zoulias Jan 26 '21 at 14:38

1 Answers1

5

Is there any way to get execute the action in parallel when the action has to be async in order to await the single async call.

Yes, but you'll need to understand what Parallel gives you that you lose when you take alternative approaches. Specifically, Parallel will automatically determine the appropriate number of threads and adjust based on usage.

It's not a practical option to convert the synchronous functions to async.

For CPU-bound methods, you shouldn't convert them.

I don't want to split the action up and have a Parallel.ForEach followed by the async calls with a WhenAll and another Parallel.ForEach as the speed of each stage can vary greatly between different iterations so splitting it would be inefficient as the faster ones would be waiting for the slower ones before continuing.

The first recommendation I would make is to look into TPL Dataflow. It allows you to define a "pipeline" of sorts that keeps the data flowing through while limiting the concurrency at each stage.

I did wonder if a PLINQ ForAll could be used instead of the Parallel.ForEach

No. PLINQ is very similar to Parallel in how they work. There's a few differences over how aggressive they are at CPU utilization, and some API differences - e.g., if you have a collection of results coming out the end, PLINQ is usually cleaner than Parallel - but at a high-level view they're very similar. Both only work on synchronous code.

However, you could use a simple Task.Run with Task.WhenAll as such:

protected async Task<IDictionary<int, ReportData>> GetReportDataAsync()
{
  var tasks = requestData.Select(async data => Task.Run(() =>
  {
    // process data synchronously
    var processedData = ProcessData(data);

    // get some data async
    var reportRequest = await BuildRequestAsync(processedData);

    // synchronous building
    var report = reportRequest.BuildReport();

    return (Key: data.ReportId, Value: report);
  })).ToList();
  var results = await Task.WhenAll(tasks);
  return results.ToDictionary(x => x.Key, x => x.Value);
}

You may need to apply a concurrency limit (which Parallel would have done for you). In the asynchronous world, this would look like:

protected async Task<IDictionary<int, ReportData>> GetReportDataAsync()
{
  var throttle = new SemaphoreSlim(10);
  var tasks = requestData.Select(data => Task.Run(async () =>
  {
    await throttle.WaitAsync();
    try
    {
      // process data synchronously
      var processedData = ProcessData(data);

      // get some data async
      var reportRequest = await BuildRequestAsync(processedData);

      // synchronous building
      var report = reportRequest.BuildReport();

      return (Key: data.ReportId, Value: report);
    }
    finally
    {
      throttle.Release();
    }
  })).ToList();
  var results = await Task.WhenAll(tasks);
  return results.ToDictionary(x => x.Key, x => x.Value);
}
Stephen Cleary
  • 437,863
  • 77
  • 675
  • 810
  • 1
    Thanks @Stephen - Can always rely on you to provide comprehensive answers to async questions. The Task.Run() with semaphore looks like it should do the trick. TPL Dataflow may be a superior solution but looks like it would add complexity / difficulty maintainence headaches in the future so this seems like a good compromise. One of those simple answers that I wonder why I didn't think of that myself – Mog0 Jan 26 '21 at 13:51
  • 2
    I think you may have made a small error. The async should be on the lambda inside the Task.Run, not the one inside the Select. The awaits are all on the Task.Run()'s lambda – Mog0 Jan 26 '21 at 14:13
  • *"Parallel will automatically determine the appropriate number of threads and adjust based on usage."* This sounds nicer than what's actually happens. Recently I realized that the default value of the `ParallelOptions.MaxDegreeOfParallelism` is `-1`, which means unbounded parallelism. In other words if the `Parallel.ForEach` is invoked with its default options, like in this question's code sample, it will just saturate the `ThreadPool`, and will keep it saturated until the source enumerable completes. Ouch! – Theodor Zoulias Jan 26 '21 at 14:50
  • Yes, `Parallel` places the code on the thread pool, and lets the thread pool manage adjustments. Note that it also uses intelligent partitioning, so it's not just throwing out a task per item. `PLINQ` I believe tries to use all the *cores*, which is generally worse. E.g., two `Parallel` calls can coexist; two `PLINQ` calls will interfere. – Stephen Cleary Jan 26 '21 at 15:28
  • 1
    By saturating the `ThreadPool`, the `Parallel` interferes with everything the program may doing concurrently. Async continuations, `System.Timers.Timer` handlers, will all be affected. That's a poor behavior IMHO. Personally I don't think that I'll ever use again the `Parallel`, without specifying a reasonable `MaxDegreeOfParallelism`. AFAIK the PLINQ default is `Environment.ProcessorCount`, which is reasonable. The upcoming `Parallel.ForEachAsync` is probably going to have the same default (the same with PLINQ). – Theodor Zoulias Jan 26 '21 at 15:55