Concurrent requests with HttpClient take longer than expected

Question

I have a webservice which receives multiple requests at the same time. For each request, I need to call another webservice (authentication things). The problem is, if multiple (>20) requests happen at the same time, the response time suddenly gets a lot worse.

I made a sample to demonstrate the problem:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Net;
using System.Net.Http;
using System.Threading.Tasks;

namespace CallTest
{
    public class Program
    {
        private static readonly HttpClient _httpClient = new HttpClient(new HttpClientHandler { Proxy = null, UseProxy = false });

        static void Main(string[] args)
        {
            ServicePointManager.DefaultConnectionLimit = 100;
            ServicePointManager.Expect100Continue = false;

            // warmup
            CallSomeWebsite().GetAwaiter().GetResult();
            CallSomeWebsite().GetAwaiter().GetResult();

            RunSequentiell().GetAwaiter().GetResult();

            RunParallel().GetAwaiter().GetResult();
        }

        private static async Task RunParallel()
        {
            var tasks = new List<Task>();
            for (var i = 0; i < 300; i++)
            {
                tasks.Add(CallSomeWebsite());
            }
            await Task.WhenAll(tasks);
        }

        private static async Task RunSequentiell()
        {
            var tasks = new List<Task>();
            for (var i = 0; i < 300; i++)
            {
                await CallSomeWebsite();
            }
        }

        private static async Task CallSomeWebsite()
        {
            var watch = Stopwatch.StartNew();
            using (var result = await _httpClient.GetAsync("http://example.com").ConfigureAwait(false))
            {
                // more work here, like checking success etc.
                Console.WriteLine(watch.ElapsedMilliseconds);
            }
        }
    }
}

Sequential calls are no problem. They take a few milliseconds to finish and the response time is mostly the same.

However, parallel request start taking longer and longer the more requests are being sent. Sometimes it takes even a few seconds. I tested it on .NET Framework 4.6.1 and on .NET Core 2.0 with the same results.

What is even stranger: I traced the HTTP requests with WireShark and they always take around the same time. But the sample program reports much higher values for parallel requests than WireShark.

How can I get the same performance for parallel requests? Is this a thread pool issue?

To clarify, with `DefaultConnectionLimit` set to 100, you're still seeing a slowdown for 25 concurrent requests? Are you sure the server isn't throttling you? — Stephen Cleary, Aug 16 '17 at 15:58
@StephenCleary Yes, the problem still persists. The server doesn't throttle me, WireShark reports that all requests finished in way less time than what we see in the console output. — Manuel Allenspach, Aug 16 '17 at 16:16

score 4 · Accepted Answer · answered Oct 03 '18 at 11:13

This behaviour has been fixed with .NET Core 2.1. I think the problem was the underlying windows WinHTTP handler, which was used by the HttpClient.

In .NET Core 2.1, they rewrote the HttpClientHandler (see https://blogs.msdn.microsoft.com/dotnet/2018/04/18/performance-improvements-in-net-core-2-1/#user-content-networking):

In .NET Core 2.1, HttpClientHandler has a new default implementation implemented from scratch entirely in C# on top of the other System.Net libraries, e.g. System.Net.Sockets, System.Net.Security, etc. Not only does this address the aforementioned behavioral issues, it provides a significant boost in performance (the implementation is also exposed publicly as SocketsHttpHandler, which can be used directly instead of via HttpClientHandler in order to configure SocketsHttpHandler-specific properties).

This turned out to remove the bottlenecks mentioned in the question.

On .NET Core 2.0, I get the following numbers (in milliseconds):

Fetching URL 500 times...
Sequentiell   Total: 4209, Max:  35, Min: 6, Avg:  8.418
Parallel      Total:  822, Max: 338, Min: 7, Avg: 69.126

But on .NET Core 2.1, the individual parallel HTTP requests seem to have improved a lot:

Fetching URL 500 times...
Sequentiell   Total: 4020, Max:  40, Min: 6, Avg:  8.040
Parallel      Total:  795, Max:  76, Min: 5, Avg:  7.972

score 2 · Answer 2 · edited Jun 20 '20 at 09:12

In the question's RunParallel() function, a stopwatch is started for all 300 calls in the first second of the program running, and ended when each http request completes.

Therefore these times can't really be compared to the sequential iterations.

For smaller numbers of parallel tasks e.g. 50, if you measure the wall time that the sequential and parallel methods take you should find that the parallel method is faster due to it pipelining as many GetAsync tasks as it can.

That said, when running the code for 300 iterations I did find a repeatable several-second stall when running outside the debugger only:

Debug build, in debugger: Sequential 27.6 seconds, parallel 0.6 seconds

Debug build, without debugger: Sequential 26.8 seconds, parallel 3.2 seconds

[Edit]

There's a similar scenario described in this question, its possibly not relevant to your problem anyway.

This problem gets worse the more tasks are run, and disappears when:

Swapping the GetAsync work for an equivalent delay
Running against a local server
Slowing the rate of tasks creation / running less concurrent tasks

The watch.ElapsedMilliseconds diagnostic stops for all connections, indicating that all connections are affected by the throttling.

Seems to be some sort of (anti-syn-flood?) throttling in the host or network, that just halts the flow of packets once a certain number of sockets start connecting.

score 1 · Answer 3 · answered Aug 17 '17 at 14:45

It sounds like for whatever reason, you're hitting a point of diminishing returns at around 20 concurrent Tasks. So, your best option might be to throttle your parallelism. TPL Dataflow is a great library for achieving this. To follow your pattern, add a method like this:

private static Task RunParallelThrottled()
{
    var throtter = new ActionBlock<int>(i => CallSomeWebsite(),
        new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 20 });

    for (var i = 0; i < 300; i++)
    {
        throttler.Post(i);
    }
    throttler.Complete();
    return throttler.Completion;
}

You might need to experiment with MaxDegreeOfParallelism until you find the sweet spot. Note that this is more efficient than doing batches of 20. In that scenario, all 20 in the batch would need to complete before the next batch begins. With TPL Dataflow, as soon as one completes, another is allowed to begin.

Jacob Lambert · Answer 4 · 2017-08-16T16:57:52.240

The reason that you are having issues is that .NET does not resume Tasks in the order that they are awaited, an awaited Task is only resumed when a calling function cannot resume execution, and Task is not for Parallel execution.

If you make a few modifications so that you pass in i to the CallSomeWebsite function and call Console.WriteLine("All loaded"); after you add all the tasks to the list, you will get something like this: (RequestNumber: Time)

Do you notice how every Task is created before any of the times are printed out to the screen? The entire loop of creating Tasks completes before any of the Tasks resume execution after awaiting the network call.

Also, see how request 199 is completed before request 1? .NET will resume Tasks in the order that it deems best (This is guaranteed to be more complicated but I am not exactly sure how .NET decides which Task to continue).

One thing that I think you might be confusing is Asynchronous and Parallel. They are not the same, and Task is used for Asynchronous execution. What that means is that all of these tasks are running on the same thread (Probably. .NET can start a new thread for tasks if needed), so they are not running in Parallel. If they were truly Parallel, they would all be running in different threads, and the execution times would not be increasing for each execution.

Updated functions:

    private static async Task RunParallel()
    {
        var tasks = new List<Task>();
        for (var i = 0; i < 300; i++)
        {
            tasks.Add(CallSomeWebsite(i));
        }
        Console.WriteLine("All loaded");
        await Task.WhenAll(tasks);
    }

    private static async Task CallSomeWebsite(int i)
    {
        var watch = Stopwatch.StartNew();
        using (var result = await _httpClient.GetAsync("https://www.google.com").ConfigureAwait(false))
        {
            // more work here, like checking success etc.
            Console.WriteLine($"{i}: {watch.ElapsedMilliseconds}");
        }
    }

As for the reason that the time printed is longer for the Asynchronous execution then the Synchronous execution, your current method of tracking time does not take into account the time that was spent between execution halt and continuation. That is why all of the reporting execution times are increasing over the set of completed requests. If you want an accurate time, you will need to find a way of subtracting the time that was spent between the await occurring and execution continuing. The issue isn't that it is taking longer, it is that you have an inaccurate reporting method. If you sum the time for all the Synchronous calls, it is actually significantly more than the max time of the Asynchronous call:

Sync: 27965
Max Async: 2341

I don't think you answer my question. Of course "All loaded" gets printed before any task completes. And I am aware that the task completion order is somewhat random. This still doesn't explain why the response times are so much longer. Also, it seems Tasks can run on different threads: https://stackoverflow.com/q/33821679/2829009 — Manuel Allenspach, Aug 16 '17 at 16:23
@Manu See my edit. I thought you were asking something else. And as for `Task`s running on different threads, I said that that is a possibility in my answer but in this instance that is probably not happening. — Jacob Lambert, Aug 16 '17 at 16:58
Oh, I see now. My timing method is somewhat flawed. The only problem is that the `Timeout` property of the `HttpClient` seems to measure the time in a similar way... — Manuel Allenspach, Aug 17 '17 at 08:40
"If they were truly Parallel, they would all be running in different threads, and the execution times would not be increasing for each execution." That is not accurate. These are I/O bound tasks, and forcing 300 threads into the mix will not speed up any of them. I would expect roughly the same results in that scenario, or worse. If the OP wants all 300 calls to happen concurrently, then he is already doing it correctly with `Task.WhenAll`. — Todd Menier, Aug 17 '17 at 15:05

Concurrent requests with HttpClient take longer than expected

4 Answers4