I'm creating an app to show time tables. In order to do that, I need to download huge amount of data to my API. There's nearly 6000 JSON files to download at the beggining of the month (approximately 190 bus lines and each line have URLs for each day of the month) and amount of JSONs to download decreases daily (on second day it will be ~5800 JSONs, on third day ~5600 and so on) from my local public transport API. Each JSON have about 3 MB and with 6000 URLs at the beggining of each month, it's about 17 GB of data. With that, I was wondering if there's a faster way to do it.
At the beggining I tried to use this function:
public async static Task<List<string>> MassDataDownload(List<string> urlList)
{
List<string> listOfJsons = new List<string>();
using (HttpClient client = new HttpClient())
{
foreach(var url in urlList)
{
HttpResponseMessage response = await client.GetAsync(url);
var json = string.Empty;
if (response.IsSuccessStatusCode)
json = await response.Content.ReadAsStringAsync();
else
continue;
listOfJsons.Add(json);
}
}
return listOfJsons;
}
but it took almost 10 minutes to download not even 1/10 of links. Then I came across this mass data download SO page and my mistake was to use one HttpClient to download AND I was trying to reach one site lot of times within short amount of times. Based on information there, I created this function:
public async static Task<List<string>> MassDataDownload(List<string> urlList)
{
BlockingCollection<HttpClient> ClientQueue = new BlockingCollection<HttpClient>();
urlList.ForEach(x => ClientQueue.Add(new HttpClient()));
List<string> listOfJsons = new List<string>();
foreach (var url in urlList)
{
var worker = ClientQueue.Take();
var json = await worker.GetStringAsync(url);
worker.Dispose();
listOfJsons.Add(json);
}
return listOfJsons;
}
but still it's downloading slow. Is there any faster way to download this data or is there any framework that could help me to achieve that?