5

I'm trying to make as many HTTP requests to a URL as possible, as quickly as possible.

I'm using this code to allow me to throttle the maximum Degrees of Parallelism, so I don't overflow memory by spawning lots and lots of Tasks at once.

    public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
    {
        return Task.WhenAll(
            from partition in Partitioner.Create(source).GetPartitions(dop)
            select Task.Run(async delegate {
                using (partition)
                    while (partition.MoveNext())
                        await body(partition.Current);
            }));
    }

This seems to be working fine.

body() essentially boils down to:

async Task Body()
{
    var r = WebRequest.Create("// the url");
    await r.GetResponseAsync();
}

However, I seem to have a bottleneck somewhere. If I try and do 2500 iterations, with varying values for dop I get these results:

DOP: 50
Total Time: 00:00:14.4801781
Average (ms): 246.6088
StDev: 84.1327983759009

DOP: 75
Total Time: 00:00:09.8089530
Average (ms): 265.758
StDev: 110.22912244956

DOP: 100
Total Time: 00:00:11.9899793
Average (ms): 344.9168
StDev: 173.281468939295

DOP: 200
Total Time: 00:00:09.1512825
Average (ms): 627.0492
StDev: 572.616238312676

DOP: 500
Total Time: 00:00:09.3556978
Average (ms): 1361.5328
StDev: 1798.70589239157

DOP: 750
Total Time: 00:00:12.6076035
Average (ms): 2009.058
Normal Total: 5022646
StDev: 2348.20874093199


DOP: 1000
Total Time: 00:00:11.4721195
Average (ms): 2453.782
StDev: 2481.56238190299

DOP: 2000
Total: 00:00:11.6039888
Average (ms): 4100.5536
StDev: 2459.36983911063

Which seem to suggest that dop=50 is less than the bottleneck. When you get above dop~=100 however, you'll notice the Average time each request takes (thats the average of how long Func<T, Task> body takes to run 2500 times) increases almost linearly with DOP (theres a bit of noise in these results admittedly, but theyre repeatable with small error).

This suggests that there is a "queue" inside the work body is doing, right?

I'm already setting

ServicePointManager.DefaultConnectionLimit = int.MaxValue;

and if i do

servicePoint = ServicePointManager.FindServicePoint("// the url", null);

and monitor

servicePoint.CurrentConnections 

on each execution of body, its always equal to dop (except for the inital ramp up and tail off).

I've tried this from various networks, so its unlikely to be hardware based, and it shouldn't be the remote server as its designed for heavy inbound loads (not that the numbers i'm talking about are even heavy)

How can I better profile what I'm doing?

Community
  • 1
  • 1
Andrew Bullock
  • 36,616
  • 34
  • 155
  • 231
  • There are a lot of possibilities here, but my first guess would be that you're hitting the Windows concurrent connect limit. Another possibility is that the server's hardware is treating your connects as a potential DOS attack and throttling them. – Collin Dauphinee Dec 27 '15 at 21:38
  • Avg of what? Per request or total? Post the measuring code. – usr Dec 27 '15 at 21:39
  • @usr Average time to execute `body` 2500 times, ill clarify the question – Andrew Bullock Dec 27 '15 at 21:39
  • @CollinDauphinee Can you comment more on the windows setting? Shouldnt be the server, its designed to be hammered – Andrew Bullock Dec 27 '15 at 21:40
  • @AndrewBullock: Windows has a limit on concurrent connections, and I *believe* it defaults to 1000. There's a registry setting for it, but I'm not in a position to find it right now. – Collin Dauphinee Dec 27 '15 at 21:41

1 Answers1

1

The total time to perform all that work levels of between 9 and 11 seconds. That makes sense because when increasing the DOP (exponentially) you'll eventually saturate the backend resource or the network or something else.

I bet if you had posted lower DOP benchmark numbers we would see higher total times.

When you double the number of concurrent requests at this point the average completion time doubles.

Look at the throughput measured in items per second or the total time taken. That's the interesting metric. Per-item latency is not.

usr
  • 168,620
  • 35
  • 240
  • 369
  • Yeah i did lots of tests and snipped the results for brevity. <75 dop sees total time increase linearly as youd expect. Presumably for items/s i need to measure items completing/s. Servicepoint.currentconnections is always == dop, as the queuing seems deeper down than my code. – Andrew Bullock Dec 28 '15 at 01:20
  • Can you post all code used to measure this? It's a little unclear what those numbers represent. Also, can you say what results you expected? – usr Dec 28 '15 at 17:07
  • Here is my code, its using the AWS SES service (which I'm allowed to send 1000/s requests to - after that i should see throttling) https://gist.github.com/trullock/a112885d374a081aee4d – Andrew Bullock Jan 24 '16 at 11:35
  • The code looks fine. The thread pool config should not do anything because async IO does not use threads and the parallel foreach also does not use any. Since according to the service point there is the expected number of connections going out it *looks* like an SES throttling (or network saturation). You can test this by doing the same thing with a web request and server that is unthrottled. For example, try google.com or example.org. – usr Jan 24 '16 at 13:22
  • Is it right that one call takes about 2.2ms? That seems extremely fast... Suspicious. Internet round trip time should be higher. Like 10-30ms for a home connection. Try turning off HTTP pipelining and make sure using netstat (or anything else) that there are DOP TCP connections open at the same time. – usr Jan 24 '16 at 13:26
  • thanks for your time. Yeah, the results suggest im getting throttled somewhere. I shall try it on different hosts, and perhaps using HttpClient directly (what the AWS SDK uses deep inside). Any idea on how to find the bottleneck if its network related? – Andrew Bullock Jan 24 '16 at 13:58
  • where did 2.2 come from? – Andrew Bullock Jan 24 '16 at 13:59
  • if youre interested, heres what `out.csv` looks like for dop:1000,iterations:10000: https://gist.github.com/trullock/b5c2253b9b5df0d02d67 – Andrew Bullock Jan 24 '16 at 14:00
  • 2.2 = 11sec/5000. Right? Ah, wrong because it's parallel. – usr Jan 24 '16 at 14:02