I would like to process a list of 50,000 urls through a web service, The provider of this service allows 5 connections per second.
I need to process these urls in parallel with adherence to provider's rules.
This is my current code:
static void Main(string[] args)
{
process_urls().GetAwaiter().GetResult();
}
public static async Task process_urls()
{
// let's say there is a list of 50,000+ URLs
var urls = System.IO.File.ReadAllLines("urls.txt");
var allTasks = new List<Task>();
var throttler = new SemaphoreSlim(initialCount: 5);
foreach (var url in urls)
{
await throttler.WaitAsync();
allTasks.Add(
Task.Run(async () =>
{
try
{
Console.WriteLine(String.Format("Starting {0}", url));
var client = new HttpClient();
var xml = await client.GetStringAsync(url);
//do some processing on xml output
client.Dispose();
}
finally
{
throttler.Release();
}
}));
}
await Task.WhenAll(allTasks);
}
Instead of var client = new HttpClient();
I will create a new object of the target web service but this is just to make the code generic.
Is this the correct approach to handle and process a huge list of connections? and is there anyway I can limit the number of established connections per second to 5 as the current implementation will not consider any timeframe?
Thanks