10

Consider: you have a collection of user ids and want to load the details of each user represented by their id from an API. You want to bag up all of those users into some kind of collection and send it back to the calling code. And you want to use LINQ.

Something like this:

var userTasks = userIds.Select(userId => GetUserDetailsAsync(userId));
var users = await Task.WhenAll(tasks); // users is User[]

This was fine for my app when I had relatively few users. But, there came a point where it didn't scale. When it got to the point of thousands of users, this resulted in thousands of HTTP requests being fired concurrently and bad things started to happen. Not only did we realise we were launching a denial of service attack on the API we were consuming as did this, we were also bringing our own application to the point of collapse through thread starvation.

Not a proud day.

Once we realised that the cause of our woes was a Task.WhenAll / Select combo, we were able to move away from that pattern. But my question is this:

What is going wrong here?

As I read around on the topic, this scenario seems well described by #6 on Mark Heath's list of Async antipatterns: "Excessive parallelization":

Now, this does "work", but what if there were 10,000 orders? We've flooded the thread pool with thousands of tasks, potentially preventing other useful work from completing. If ProcessOrderAsync makes downstream calls to another service like a database or a microservice, we'll potentially overload that with too high a volume of calls.

Is this actually the reason? I ask as my understanding of async / await becomes less clear the more I read about the topic. It's very clear from many pieces that "threads are not tasks". Which is cool, but my code appears to be exhausting the number of threads that ASP.NET Core can handle.

So is that what it is? Is my Task.WhenAll and Select combo exhausting the thread pool or similar? Or is there another explanation for this that I'm not aware of?

Update:

I turned this question into a blog post with a little more detail / waffle. You can find it here: https://blog.johnnyreilly.com/2020/06/taskwhenall-select-is-footgun.html

John Reilly
  • 5,791
  • 5
  • 38
  • 63
  • 2
    It's not necessarily thread pool starvation, it's just a natural result of not [throttling your tasks](https://stackoverflow.com/q/22492383/11683). – GSerg Jun 20 '20 at 18:39

3 Answers3

6

N+1 Problem

Putting threads, tasks, async, parallelism to one side, what you describe is an N+1 problem, which is something to avoid for exactly what happened to you. It's all well and good when N (your user count) is small, but it grinds to a halt as the users grow.

You may want to find a different solution. Do you have to do this operation for all users? If so, then maybe switch to a background process and fan-out for each user.

Back to the footgun (I had to look that up BTW ).

Tasks are a promise, similar to JavaScript. In .NET they may complete on a separate thread - usually a thread from the thread pool.

In .NET Core, they usually do complete on a separate thread if not complete and the point of awaiting, for an HTTP request that is almost certain to be the case.

You may have exhausted the thread pool, but since you're making HTTP requests, I suspect you've exhausted the number of concurrent outbound HTTP requests instead. "The default connection limit is 10 for ASP.NET hosted applications and 2 for all others." See the documentation here.

Is there a way to achieve some parallelism and not take exhaust a resource (threads or http connections)? - Yes.

Here's a pattern I often implement for just this reason, using Batch() from morelinq.

IEnumerable<User> users = Enumerable.Empty<User>();
IEnumerable<IEnumerable<string>> batches = userIds.Batch(10);
foreach (IEnumerable<string> batch in batches)
{
    Task<User> batchTasks = batch.Select(userId => GetUserDetailsAsync(userId));
    User[] batchUsers = await Task.WhenAll(batchTasks);
    users = users.Concat(batchUsers);
}

You still get ten asynchronous HTTP requests to GetUserDetailsAsync(), and you don't exhaust threads or concurrent HTTP requests (or at least max out with the 10).

Now if this is a heavily used operation or the server with GetUserDetailsAsync() is heavily used elsewhere in the app, you may hit the same limits when your system is under load, so this batching is not always a good idea. YMMV.

James Skimming
  • 4,991
  • 4
  • 26
  • 32
  • 3
    You start another batch of 10. 9 out of 10 compete almost instantly. The last one takes 11 minutes to complete. For these 11 minutes, you are waiting for that one task, not starting any other tasks, even though your limit for the number of active tasks is 10, and you could have started 9 more tasks 11 minutes ago. – GSerg Jun 20 '20 at 19:46
  • Thanks for the detailed answer James; I appreciate that! Yeah I solved the problem by using a similar approach to the fan out you allude to, but but client side. I wrote a React hook that gradually loads the data from the client and amalgamates it there. It throttles the number of concurrent requests to ease load on the server and consequently the API it depends upon. I will probably blog about it at some point. – John Reilly Jun 20 '20 at 21:19
  • So the great news is I no longer have a coding problem to solve. But I do have a puzzle; the point that somewhat perplexes me is what the underlying issue is. Can the Task.WhenAll with Select exhaust the thread pool or do I not understand this correctly? I'm trying to comprehend what problems my original approach caused; I have a feeling there's something about async / awaits implementation in .NET core I'm not clear on. and weird detail I didn't mention: I could never replicate the issue on my development machine which is Windows. We run in Docker (Linux) where the issue presents. Relevant? – John Reilly Jun 20 '20 at 21:24
  • 1
    Incidentally the ServicePointManager documentation link was helpful; it lead me to reading Steve Gordon's post here around HttpClient connection pooling; a topic I didn't know much about before! https://www.stevejgordon.co.uk/httpclient-connection-pooling-in-dotnet-core – John Reilly Jun 21 '20 at 11:17
  • 1
    One using MoreLinq, you can also use `.Await`, which can preserve ordering and cap concurrent operations _without_ needing to batch them – Arithmomaniac Jun 22 '20 at 06:53
  • Note: you don't need MoreLinq (just for the Batch() method) anymore as there is .Chunk() in System.Linq as of net6.0 https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.chunk?view=net-6.0 – MemeDeveloper Jun 14 '23 at 12:46
6

You already have an excellent answer here, but just to chime in:

There's no problem with creating thousands of tasks. They're not threads.

The core problem is that you're hitting the API way too much. So the best solutions are going to change how you call that API:

  1. Do you really need user details for thousands of users, all at once? If this is for a dashboard display, then change your API to enforce paging; if this is for a batch process, then see if you can access the data directly from the batch process.
  2. Use a batch route for that API if it supports one.
  3. Use caching if possible.
  4. Finally, if none of the above are possible, look into throttling the API calls.

The standard pattern for asynchronous throttling is to use SemaphoreSlim, which looks like this:

using var throttler = new SemaphoreSlim(10);
var userTasks = userIds.Select(async userId =>
{
  await throttler.WaitAsync();
  try { await GetUserDetailsAsync(userId); }
  finally { throttler.Release(); }
});
var users = await Task.WhenAll(tasks); // users is User[]

Again, this kind of throttling is best only if you can't make the design changes to avoid thousands of API calls in the first place.

Stephen Cleary
  • 437,863
  • 77
  • 675
  • 810
  • I know you're the master of async, but saying that there's no problem with firing thousands of tasks at once, is that accurate? Even when reusing HttpClient, is it not possible to run into port exhaustion? I know I had that scenario at work with some code running in Azure, but we were NOT reusing HttpClient then. We changed the code to reuse it and also did parallel batches – GR7 Jun 27 '20 at 05:08
  • 1
    There's no problem with thousands of tasks. There are problems with thousands of HTTP connections. I should have been more clear, sorry. – Stephen Cleary Jun 27 '20 at 13:12
  • @StephenCleary: I still do not buy thousands of tasks are not a problem. Even if everything is async and no thread is waiting for some external resources which are distributed accross many machines you still have the implicit assumption that they are completing in a sane manner (which may not be true). What will happen if your thousands of tasks complete at once and you are then doing resource intensive operations? Async await is great to free up the UI thread but since the programming model is logically still single threaded you can only interleave more work in between on central threads. – Alois Kraus Jun 27 '20 at 15:09
  • 1
    Thanks Stephen - I really appreciate your contribution! I solved it with throttling in the end, but on the client rather than the server. The server would have worked but was leading to another problem where the request from client to server timed out as the aggregation took place on the server (more than 60 seconds and nginx kicked us out) There was actually a nice side effect from doing it that way; users could see loading progress tick up. Which looked cool! – John Reilly Jul 08 '20 at 15:53
1

While there is no thread waiting for async operation if the async operation is pure, there is a thread for continuation, so assuming that your GetUserDetailsAsync will await for some IO-bound operation the continuation (parsing output, returning result ...) will need to run on some thread so your Task.Result which was created by GetUserDetailsAsync can be set, so every one of them will wait for a thread from thread pool to finish.

Guru Stron
  • 102,774
  • 10
  • 95
  • 132