I need to fetch content from some 3000 urls. I'm using HttpClient
, create Task
for each url, add tasks to list and then await Task.WhenAll
. Something like this
var tasks = new List<Task<string>>();
foreach (var url in urls) {
var task = Task.Run(() => httpClient.GetStringAsync(url));
tasks.Add(task);
}
var t = Task.WhenAll(tasks);
However many tasks end up in Faulted
or Canceled
states. I thought it might be problem with the concrete urls, but no. I can fetch those url no problem with curl in parallel.
I tried HttpClientHandler
, WinHttpHandler
with various timeouts etc. Always several hundred urls end with an error.
Then I tried to fetch those urls in batches of 10 and that works. No errors, but very slow. Curl will fetch 3000 urls in parallel very fast.
Then I tried to get httpbin.org 3000 times to verify that the issue is not with my particular urls:
var handler = new HttpClientHandler() { MaxConnectionsPerServer = 5000 };
var httpClient = new HttpClient(handler);
var tasks = new List<Task<HttpResponseMessage>>();
foreach (var _ in Enumerable.Range(1, 3000)) {
var task = Task.Run(() => httpClient.GetAsync("http://httpbin.org"));
tasks.Add(task);
}
var t = Task.WhenAll(tasks);
try { await t.ConfigureAwait(false); } catch { }
int ok = 0, faulted = 0, cancelled = 0;
foreach (var task in tasks) {
switch (task.Status) {
case TaskStatus.RanToCompletion: ok++; break;
case TaskStatus.Faulted: faulted++; break;
case TaskStatus.Canceled: cancelled++; break;
}
}
Console.WriteLine($"RanToCompletion: {ok} Faulted: {faulted} Canceled: {cancelled}");
Again, always several hundred Tasks end in error.
So, what is the issue here? Why I cannot get those urls with async
?
I'm using .NET Core and therefore the suggestion to use ServicePointManager (Trying to run multiple HTTP requests in parallel, but being limited by Windows (registry)) is not applicable.
Also, the urls I need to fetch point to different hosts. The code with httpbin is just a test, to show that the problem was not with my urls being invalid.