How to efficiently make 1000s of web requests as quickly as possible

Question

I need to make 100,000s of lightweight (i.e. small Content-Length) web requests from a C# console app. What is the fastest way I can do this (i.e. have completed all the requests in the shortest possible time) and what best practices should I follow? I can't fire and forget because I need to capture the responses.

Presumably I'd want to use the async web requests methods, however I'm wondering what the impact of the overhead of storing all the Task continuations and marshalling would be.

Memory consumption is not an overall concern, the objective is speed.

Presumably I'd also want to make use of all the cores available.

So I can do something like this:

Parallel.ForEach(iterations, i =>
{
    var response = await MakeRequest(i);
    // do thing with response
});

but that won't make me any faster than just my number of cores.

I can do:

Parallel.ForEach(iterations, i =>
{
    var response = MakeRequest(i);
    response.GetAwaiter().OnCompleted(() =>
    {
        // do thing with response
    });
});

but how do I keep my program running after the ForEach. Holding on to all the Tasks and WhenAlling them feels bloated, are there any existing patterns or helpers to have some kind of Task queue?

Is there any way to get any better, and how should I handle throttling/error detection? For instance, if the remote endpoint is slow to respond I don't want to continue spamming it.

I understand I also need to do:

ServicePointManager.DefaultConnectionLimit = int.MaxValue

Anything else necessary?

How is this too broad a question? Why not be constructive and let me know how I can be more specific if you feel that way. Merry Christmas — Andrew Bullock, Dec 24 '15 at 18:52

score 3 · Accepted Answer · answered Dec 24 '15 at 20:52

3

The Parallel class does not work with async loop bodies so you can't use it. Your loop body completes almost immediately and returns a task. There is no parallelism benefit here.

This is a very easy problem. Use one of the standard solutions for processing a series of items asynchronously with a given DOP (this one is good: http://blogs.msdn.com/b/pfxteam/archive/2012/03/05/10278165.aspx. Use the last piece of code).

You need to empirically determine the right DOP. Simply try different values. There is no theoretical way to derive the best value because it is dependent on many things.

The connection limit is the only limit that's in your way.

response.GetAwaiter().OnCompleted

Not sure what you tried to accomplish there... If you comment I'll explain the misunderstanding.

answered Dec 24 '15 at 20:52

usr

168,620
35
240
369

Because i'll have so many Tasks, it seems wasteful to hold on to them all for ever, waiting for the final Task to complete. It seems there should be a much more efficient, producer/consumer pattern here. – Andrew Bullock Dec 25 '15 at 10:58
I realise the example code in the question is far from complete. Feels like I could have a `ConcurrentBag` of tasks, where the tasks remove themselves once complete, and then wait on anything in the bag – Andrew Bullock Dec 25 '15 at 11:35
What I propose here would stream the tasks. There is no need to hold onto all of them. Did you look at the linked blog post? – usr Dec 25 '15 at 19:07
can you explain the streaming bit to me? I cant see how it doesnt create all the tasks at once. are you just referring to the batching/DOP bit in the last example? so youre only making as many tasks as there are in a batche at once? doesnt that mean that a single task in a batch could be slow and prevent the next batch starting? – Andrew Bullock Dec 26 '15 at 09:58
That code creates one task per partition and puts it into WhenAll. Maybe that was the misunderstanding? There are no batches. Are we talking about the same piece of code? – usr Dec 26 '15 at 12:02
Reading your other answers here http://stackoverflow.com/a/11139555/28543 and here http://stackoverflow.com/a/11100423/28543 now make this make sense. thanks. Got any advice on how to empirically measure an appropriate DOP? – Andrew Bullock Dec 26 '15 at 12:22
Try different values and measure throughput. Don't make the benchmark too short, that's the most common mistake. All one-time effects must disappear in the noise.; If you find that it's impossible to have one optimal DOP for all possible workloads (e.g. different servers) then the problem becomes a lot harder. – usr Dec 26 '15 at 13:13

score 2 · Answer 2 · edited May 23 '17 at 10:27

2

The operation you want to perform is

Call an I/O method
Process the result

You are correct that you should use an async version of the I/O method. What's more, you only need 1 thread to start all of the I/O operations. You will not benefit from parallelism here.

You will benefit from parallelism in the second part - processing the result, as this will be a CPU-bound operation. Luckily, async/await will do all the job for you. Console applications don't have a synchronization context. It means that the part of the method after an await will run on a thread pool thread, optimally utilizing all CPU cores.

private async Task MakeRequestAndProcessResult(int i)
{
    var result = await MakeRequestAsync();
    ProcessResult(result);
}

var tasks = iterations.Select(i => MakeRequestAndProcessResult(i)).ToArray();

To achieve the same behavior in an environment with a synchronization context (for example WPF or WinForms), use ConfigureAwait(false).

var result = await MakeRequestAsync().ConfigureAwait(false);

To wait for the tasks to complete, you can use await Task.WhenAll(tasks) inside an async method or Task.WaitAll(tasks) in Main().

Throwing 100k requests at a web service will probably kill it, so you will have to limit it. You can check answers to this question to find some options how to do it.

edited May 23 '17 at 10:27

Community

1
1

answered Dec 24 '15 at 22:26

Jakub Lortz

14,616
3
25
39

1

Thanks, didnt realise console apps had no SyncContext. With this approach though, I'll end up with 10000s of Tasks in an array, which i'll be holding on to until everything completes. isnt there a more efficient, producer/consumer approach? – Andrew Bullock Dec 25 '15 at 10:57
This is really problematic because it's starting all tasks at once. Might overwhelm the resource being called, and exceed RAM. – usr Dec 25 '15 at 19:08
@usr In the last paragraph there is a link to another question on SO that is exactly about this issue. In my answer to that question I suggested the same `ForEachAsync` method you linked in your answer. – Jakub Lortz Dec 25 '15 at 19:59
@AndrewBullock usr's answer deals with that problem. And so do the answers to the question I linked in the last paragraph. – Jakub Lortz Dec 25 '15 at 20:01
Thanks both, appreciate your help. Feels like im missing something at this point, i'll re-read everything :) – Andrew Bullock Dec 26 '15 at 09:49

score 0 · Answer 3 · answered Dec 24 '15 at 20:28

Parallel.ForEach should be able to use more threads than there are cores if you explicitly set the MaxDegreeOfParallelism property of the ParallelOptions parameter (in the overload of ForEach where there is that parameter) - see https://msdn.microsoft.com/en-us/library/system.threading.tasks.paralleloptions.maxdegreeofparallelism(v=vs.110).aspx

You should be able to set this on 1,000 to get it to use 1,000 threads or even more, but that might not be efficient due to the threading overheads. You may wish to experiment (eg. loop from eg. 100 to 1,000 stepping in 100s to try submitting 1,000 requests each time and time start to finish) or even set up some kind of self-tuning algorithm.

How to efficiently make 1000s of web requests as quickly as possible

3 Answers3

Linked