-1

In my ASP.NET Core controller, I have an Action which looks something like this:

public async Task<ActionResult> SomeAction(int someId)
{
    int[] itemIds = await _service.GetSomeItems(someId);
    var model = new List<ItemDetail>(itemIds.Length);

    foreach (int id in itemIds)
        model.Add(await _service.GetItemDetails(id));
    return View(model);
}

It works, but pretty slow. I want to improve performance of the "SomeAction" by spawning multiple parallel Tasks:

public async Task<ActionResult> SomeAction(int someId)
{
    int[] itemIds = await _service.GetSomeItems(someId);
    var tasks = new List<Task>();

    foreach (int id in itemIds)
        tasks.Add(_service.GetItemDetails(id));
    var model = await Task.WhenAll(tasks);

    return View(model);
}

This works better, but I'm not sure if this approach is a good practice. Normally the _service.GetSomeItems API returns 20-30 elements, but sometimes it might return 100-200 elements. In my case _service is 3rd party REST API, with latency 100-150 ms.

My question is - how many Tasks can I spawn at once? Will my code lead to ThreadPool starvation, if multiple requests will come simultaneously (each of them will spawn 20-200 Tasks)? Is my approach generally a bad practice?

What I'm trying to avoid is ThreadPool starvation, which may end up with a deadlock in some scenarios.

Update: _service.GetItemDetails(id) - is a 3rd Party Web API, request latency is 100-150 ms.

Tiber Septim
  • 315
  • 3
  • 11
  • "It works, but pretty slow." - why is it slow? is it a database call? Assuming you need multiple threads should not be your first approach... Please show GetItemDetails() – Mitch Wheat Oct 26 '21 at 09:29
  • 1
    We can't help you without seeing inside `GetItemDetails`. – Dai Oct 26 '21 at 09:29
  • 2
    We cant comment on the pros and cons of a black box – TheGeneral Oct 26 '21 at 09:30
  • 1
    Also, are you using `IHttpClientFactory`? Or are you doing `new HttpClient()` for every request? You might fall victim to socket exhaustion - and if you're caching a single `HttpClient` then you'll have a stale DNS cache. – Dai Oct 26 '21 at 09:30
  • 2
    Generally, in a web site, you want to leave aside thoughts of running code in parallel - you already have lots of demand for parallel activity due to *other requests the site is servicing*. Unless you're actually developing an application that is dedicated to a single user. – Damien_The_Unbeliever Oct 26 '21 at 09:33
  • GetItemDetails - is 3rd party HTTP REST web service. Problem is latency due to the distance between servers for 6000-8000 kilometers. Latency is 100-150 ms. But problem not in the latency, problem is in the loop of "await GetItemDetails()" which run sequentially one after another. – Tiber Septim Oct 26 '21 at 09:34
  • Classic N + 1 data access problem. you'd obviously be better off hitting an endpoint once and retrieving all the data at once.... – Mitch Wheat Oct 26 '21 at 09:37
  • @Dai yes I'm using IHttpClientFactory. I don't have problems with HttpClient. Problem is in multiple sequential run of async operation. – Tiber Septim Oct 26 '21 at 09:37
  • 1
    @MitchWheat Unless you're using a terrible third-party web-service that doesn't offer any way of getting lists-of-items. I run into those kinds of services all the time when I'm doing integration work. My main workaround is to employ aggressive caching, also consider using a separate background process that spams-and-stores each remote entity in a local database. – Dai Oct 26 '21 at 09:39
  • 1
    The other thing you need to consider is if the endpoint will throttle you (or even block you if you hit it so hard that it considers it a DDOS) – Mitch Wheat Oct 26 '21 at 09:39
  • @MitchWheat yes I agree, but unfortunately we can't force our 3rd Party Service provider to make a better API for us. At least that won't be done soon enough. – Tiber Septim Oct 26 '21 at 09:40
  • @Dai: I am indeed fortunate that I am not currently dealing with any "terrible third-party web-services" – Mitch Wheat Oct 26 '21 at 09:40
  • @Dai same problem here. :\ I also do cache via Redis. – Tiber Septim Oct 26 '21 at 09:40
  • @Dai, erm yes I know that. Looks like we are done here. – Mitch Wheat Oct 26 '21 at 09:41
  • @Dai: I did. And is that a problem?. I meant endpoint originally. I was simply using the posters service type. – Mitch Wheat Oct 26 '21 at 09:45
  • @MitchWheat Then we're cool then :) – Dai Oct 26 '21 at 09:51
  • 2
    You might find this useful: [How to limit the amount of concurrent async I/O operations?](https://stackoverflow.com/questions/10806951/how-to-limit-the-amount-of-concurrent-async-i-o-operations/) – Theodor Zoulias Oct 26 '21 at 10:09

1 Answers1

2

It's not a good practice. The problem isn't how many tasks can be created but how many requests that other server will accept before blocking or throttling you, the network overhead and latency, the stress on your network. You're making networking calls which means your CPU doesn't have to do anything, it just waits for a response from the remote server(s).

Then it's the overhead on the server side: opening one connection per request, executing a single query, serializing the results. If the server's code isn't well written, it may crash. If the load balancer isn't well configured, all 1000 concurrent calls may end up getting served by the same machine. Those 1000 requests may open 1000 connections that block each other.

If the remote API allows it, try to request multiple items at once. You only pay the overhead once and the server only needs to execute a single query.

Otherwise, you can try to execute multiple connections concurrently, but not all at once. It's far better to throttle requests by limiting how many can be executed concurrently and possibly how frequently.

In .NET 6 (the next LTS release, already supported in production) you can use Parallel.ForEachAsync with a limited degree of parallelism:

ConcurrentQueue<Something> list=new ConcurrentQueue<Something>(1000);
var options=new ParallelOptions { MaxDegreeOfParallelism = 20};

await Parallel.ForEachAsync(itemIds,options,async id=>{
    var result=_service.GetItemDetails(id);
    var something=CreateSomething(result);
    list.Add(something);   
});

model.Items=list;

return View(model);

Another option is to create use Channels or TPL Dataflow blocks to create a processing pipeline that processes items concurrently, eg the first step generating the IDs, the next retrieving the remote items, final one producing the final result:

ChannelReader<int> GetSome(int someId,CancellationToken token=default)
{
    var channel=Channel.CreateUnbounded<int>();
    int[] itemIds = await _service.GetSomeItems(someId);
    foreach(var id in itemIds)
    {
        if(token.IsCancellationRequested)
        {
            return;
        }
        channel.Writer.TryWrite(id);
    }
    return channel;    
}


ChannelReader<Detail> GetDetails(ChannelReader<int> reader,int dop,
    CancellationToken token=default)
{
    var channel=Channel.CreateUnbounded<Detail>();
    var writer=channel.Writer;

    var options=new ParallelOptions {
        MaxDegreeOfParallelism=dop,
        CancellationToken=token
    };

    var worker=Parallel.ForEachAsync(reader.ReadAllAsync(token),
        options,
        async id=>{
            try {
                var details=_service.GetItemDetails(id);
                await writer.WriteAsync(details);
            }
            catch(Exception exc)
            {
                //Handle the exception so we can keep processing
                //other messages
            }
        });
    worker.ContinueWith(t=>writer.TryComplete(t.Exception));
      
    return channel;    ​
}

async Task<Model> CreateModel(ChannelReader<Detail> reader,...)
{
    var allDetails=new List<Detail>(1000);
    await(foreach var detail in reader.ReadAllAsync(token))
    {
        allDetails.Add(detail);
        //Do other heavyweight things
    }
    var model=new Model {Details=allDetails,.....});
}

These methods can be chained in a pipeline:

var chIds=GetSome(123);
var chDetails=GetDetails(chIds,20);
var model=await CreateModel(chDetails);

This is a common patterns when using Channels. In languages like Go channels are used to create multi-step processing pipelines.

Converting these methods into extension methods allows creating the pipeline in a fluent manner:

static ChannelReader<Detail> GetDetails(this ChannelReader<int> reader,int dop,CancellationToken token=default)

static async Task<Model> CreateModel(this ChannelReader<Detail> reader,...)


var model= await GetSome(123);
                 .GetDetails(20)
                 .CreateModels();
Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236