2

When using Parallel.ForEach, would converting any DB or Api calls to async methods improve performance?

A bit of background, I currently have a console application that sequentially loops through a bunch of files and for each one calls an API and makes some DB calls. The main logic looks like this:

foreach (file in files)
{
    ReadTheFileAndComputeAFewThings(file);
    CallAWebService(file);
    MakeAFewDbCalls(file);
}

Currently all of the DB and web service calls are synchronous.

Changing the loop to use Parallel.ForEach has given me a massive performance increase, just as you would expect.

I am wondering if I kept the Parallel.ForEach call there, and inside the loop, change all of the webservice calls to be async (eg, HttpClient.SendAsync) and DB calls to be async (using Dapper, db.ExecuteAsync()) - would that increase performance of the application by allowing it to re-use threads? Or would it effectively do nothing as Parallel.ForEach is taking care of the thread allocation anyway?

Rocklan
  • 7,888
  • 3
  • 34
  • 49
  • 1
    what would be better is make a few calls as possible, so write sql so that the db calls can be called once. – Seabizkit Jan 20 '20 at 05:18
  • yes it would improve performance if the amount of time it is spent waiting exceed the amount of work the thread could of been doing. so depending on what you looping over. It could worsen performance as well. it depends, on how many and how quickly the loops are being done. Again your better off reducing calls were you can. Eg... get all the data for all the files then loop over, referencing the mem rather than making call each time. – Seabizkit Jan 20 '20 at 05:21
  • Could you please clarify if the question is "Would making DB calls *async inside a Parallel.ForEach loop* improve performance" or "Would converting all calls to async running in parallel improve performance compared to Parallel.ForEach"? If former - please [edit] post to clarify what you plan to do (as `async` + `Parallel.ForEach` requires solid understanding of both and more... hence chances a random user to get it right are low... so that approach totally depends on how badly you implement it :) ) – Alexei Levenkov Jan 20 '20 at 05:40
  • 1
    @Seabizkit pure `Parallel.ForEach` is easy to implement (already shown in the post) and pure `async` with `.WhenAll` is easy to implement... Getting `async` insider `Parallel.ForEach` is major pain. So while pure approaches will lead to comparable perf and correctness (so being opinion-based question) mixing two will cause major headache and could be factually answered based on how broken proposed solution would be. – Alexei Levenkov Jan 20 '20 at 05:57
  • Thanks very much @AlexeiLevenkov - I've updated the question. I mean keeping the parallel.foreach and making stuff inside the loop async. Wondering if it would help or not. – Rocklan Jan 20 '20 at 06:01
  • @Rocklan ok you are indeed asking about "async inside Parallel.Foreach". As I said it completely depends on how badly you #$@ that up :). You may want to read https://stackoverflow.com/questions/11564506/nesting-await-in-parallel-foreach if you *really* want to try it out... Basically performance would not matter as you can't sensibly do that :) – Alexei Levenkov Jan 20 '20 at 06:05
  • @AlexeiLevenkov apologies I delete my comment as i saw what you were saying.... – Seabizkit Jan 20 '20 at 06:06

3 Answers3

5

The answer is No. Asynchrony offers scalability, not performance. It allows to do the same job with less threads, and so with less memory (each blocked thread = 1 MB of wasted memory).

It’s important to keep in mind, though, that asynchronicity is not a performance optimization for an individual operation. Taking a synchronous operation and making it asynchronous will invariably degrade the performance of that one operation, as it still needs to accomplish everything that the synchronous operation did, but now with additional constraints and considerations.

It should be noted that the Parallel.ForEach API cannot be used with asynchronous body delegate. Using async with this API is a bug. The correct API to use when you want to parallelize asynchronous operations is the Parallel.ForEachAsync, available from .NET 6 and later.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
  • Note that this answer does not really apply to question as asked as OP seem to highlight (by function names in particular) that calls are I/O and not CPU bound... Indeed in that case converting *serial synchronous code* into parallel async or `Parallel.ForEach` would give essentially the same benefits and performance of switching from Parallel to completely async may indeed improve... And this answer also does not talk about actual question - "async inside a Parallel.ForEach loop ..." Both of this concerns are indicated by +2 votes :) – Alexei Levenkov Jan 21 '20 at 18:43
  • @AlexeiLevenkov I edited the answer by removing the fluff, and mentioning the incompatibility between `Parallel.ForEach` and `async`. – Theodor Zoulias Jun 20 '23 at 15:20
1

Parallel.ForEach operates on tasks, not threads. It means it can spawn more tasks, than you have threads in thread pool. In this scenario using async methods can give you performance optimization by doing all tasks with less threads.

https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.parallel.foreach?view=netcore-3.1

The Parallel.ForEach method may use more tasks than threads over the lifetime of its execution, as existing tasks complete and are replaced by new tasks. This gives the underlying TaskScheduler object the chance to add, change, or remove threads that service the loop.

bss
  • 31
  • 3
  • Note that this post does not answer "Would making DB calls async inside a Parallel.ForEach loop improve performance?" as stated in title of the question... Which may be fine if OP actually wanted to ask the question this post answers... Commented on the question to ask for clarification. – Alexei Levenkov Jan 20 '20 at 05:41
0

original

foreach (file in files)
{
    ReadTheFileAndComputeAFewThings(file);
    CallAWebService(file);
    MakeAFewDbCalls(file);
}

original + async (better than above, depending!)

foreach (file in files)
{
   await ReadTheFileAndComputeAFewThings(file);
   await CallAWebService(file);
   await MakeAFewDbCalls(file);
}

This will not be better if the calls are not actually implementing async , then it will be worse. Another way this will be worse is if the async-ness is so short they it out weight the cost of Task. Each async Task, creates a managed thread, which reverse 1mb from system, and add thread syncing time. Altho the syncing is extremely low if this is done in a tight loop it will see performance issues.

Key here is the Task must actually be the async versions.

  • SaveChanges vs SaveChangesAsync

  • Read vs ReadAsync


Parallel (better than above, depending!)

Parallel.ForEach(files, item) 
{
    ReadTheFileAndComputeAFewThings(item);
    CallAWebService(item);
    MakeAFewDbCalls(item);
}

If this can all happen at the same time, then this is better. Also only if you want to assign multiple thread, resources, remember resources are limited, you machine only has so many cores and ram, you would want to manage this depending on what else the hardware is responsible for.

Not better if the methods are not thread safe.


Parallel + async (better than above, depending!)

Parallel.ForEach(files, item) 
{
   await ReadTheFileAndComputeAFewThings(item);
   await CallAWebService(item);
   await MakeAFewDbCalls(item);
}

FYI - Parallel + async example above is actually incorrect!!! As the Parallel.ForEach itself is not async, you will need to do some research as to how to build a async version of Parallel.ForEach

Also the same comments above apply when using in conjunction.

Update

based on a comment it largly depend on whether ConfigureAwait() has been set, but assuming you haven't then. Also this will not excute in order so if CallAWebService depends on ReadTheFileAndComputeAFewThings then things will probably do wrong.

foreach (file in files)
{
   List<Task> jobs = new List<Task>();
   jobs.Add(ReadTheFileAndComputeAFewThings(file))
   jobs.Add(CallAWebService(file))
   jobs.Add(MakeAFewDbCalls(file))
   Task.WhenAll(jobs.ToArray());
}

or...

 List<Task> jobs = new List<Task>();
foreach (file in files)
{
   jobs .Add(ReadTheFileAndComputeAFewThings(file))
   jobs .Add(CallAWebService(file))
   jobs .Add(MakeAFewDbCalls(file))
}
Task.WhenAll(jobs.ToArray());

difference between the two is the the one has a lot more tasks, and you probably run into issues with the later regarding context.... aka the enumerator will no longer have the correct "index" to file and if the one call had a dependency on the other being completed first.

Amazing link explaining async... https://learn.microsoft.com/en-us/archive/blogs/benwilli/tasks-are-still-not-threads-and-async-is-not-parallel

Seabizkit
  • 2,417
  • 2
  • 15
  • 32
  • You really should do `.WhenAll` instead of `foreach` in `async` version (or at least show one)... there is no useful gains without it. – Alexei Levenkov Jan 20 '20 at 06:20
  • @AlexeiLevenkov updated based on this... i get what your saying but partly depends on how the code inside the methods looks. – Seabizkit Jan 20 '20 at 06:40
  • Would love to hear from the marker down, as to how this is not helpful. Mind blown. – Seabizkit Jan 20 '20 at 10:11