0

Hi I'm new to Multithreading- and I'm struggling to download multiple files from web using DownloadFileAsync. There are about 400 files to be download and I prepared the URLs to send request using WebClient class. I called the DownloadfileAsync using threadpool hoping that it will be faster than serial download. Url that I used will look like this with item number change for each url(104, 105 etc).

http://medicarestatistics.humanservices.gov.au/statistics/do.jsp?_PROGRAM=%2Fstatistics%2Fmbs_item_standard_report&DRILL=ag&group=104&VAR=services&STAT=count&RPT_FMT=by+state&PTYPE=month&START_DT=202101&END_DT=202101

And my code looks like below:

        foreach(var d in infolist)
        {
            string itemtype = d.Key;
            Dictionary<string, string> folderAndurl = d.Value;
            foreach (var itemcode in itemcodes)
            {
                foreach (var date in dates)
                {
                        filename = folderAndurl["folder"] + date + "_" + itemcode + ".xls";
                        url = folderAndurl["url"].Replace("XXX", itemcode).Replace("STDATE", date);

                    ThreadPool.UnsafeQueueUserWorkItem(new WaitCallback(DownloadWebAsync), new object[] { filename, url });
                    //ThreadPool.QueueUserWorkItem(new WaitCallback(DownloadWebAsync), new object[] { filename, url });
                }
            }
        }

And DownloadWebAsync as below: private void DownloadWebAsync(object state) { object[] list = state as object[]; string filename = Convert.ToString(list[0]); string url = Convert.ToString(list1);

        WebClient client = new WebClient();
        Uri uri = new Uri(url);
        client.DownloadFileCompleted += new AsyncCompletedEventHandler(Client_DownloadFileCompleted);
        client.QueryString.Add("file", filename); 
        client.QueryString.Add("url", url); 
        client.DownloadFileAsync(uri, filename);

        //throw new NotImplementedException();
    }

When the ThreadPool started I can see that Multiple BLANK Files are created straight away on disk as shown in image below. They all have 0 KB in size to start with I'm assuming all the threads in ThreadPool are being run and sending the requests to website.

screenshot

However it appears to me that files on disk are updated with downloaded data return from request 1 at a time or maximum 2 at the time(mostly 1 at a time). My expectation is update to happen simultaneously to those 0KB files - at lease 3 or 4 files should be processing at point of time as threads that call DownloadFileAsync are already running? I have no idea if I'm doing anything wrong here with code or any property need to set. My expectation is to have simultaneous download to improve download time but this is not happening right now.

Another reason I'm using treadpool is that I'm writing the status/url/download size back to UI window and I don't want UI to be unresponsive during 400 files download.

I'm also testing with Thread, TreadPool, Task Parallel Library and also using Webclient, HttpClient(async/await) etc but in all of cases, it appears that after thread or tasks are started it created blank files straight away - but actual download happens one at the time. Also tested with WebClient.DownloadFile and Timeout error occur running through threadpool so I will have to use Async.

Could someone please help me to explain if this is expected behaivour or how can I improve the download experience? I have been struggling with this for nearly a week and your help is greatly appreciated.

Regards

fatihyildizhan
  • 8,614
  • 7
  • 64
  • 88
Williams
  • 11
  • 1
  • 3
    You want to start 400 simultaneous downloads? I fear it won't go as well as you think.. – Caius Jard Apr 18 '21 at 10:43
  • Related: [Efficient way to download a huge load of files in parallel](https://stackoverflow.com/questions/62903115/efficient-way-to-download-a-huge-load-of-files-in-parallel). You can also find [here](https://stackoverflow.com/questions/60929044/c-sharp-parallel-foreach-memory-usage-keeps-growing/60930992#60930992) a solution that utilizes the TPL Dataflow library. – Theodor Zoulias Apr 18 '21 at 10:44
  • Hi, could you try changing your `DownloadWebAsync` to `Task DownloadWebAsync` and return `client.DownloadFileTaskAsync(uri, filename)`, and change `ThreadPool.UnsafeQueueUserWorkItem` to `Task.Run(async () => await DownloadWebAsync(filename, uri))` and also try adding tasks to a collection and await them after your started them by using `Task.WhenAll()` or by using `await` directly on them – Tomas Tomov Apr 18 '21 at 10:47
  • Unless there's a good reason not to, use the `Task` library instead of `ThreadPool` – asaf92 Apr 18 '21 at 12:20
  • Thank you @TomasTomov , I will change the code as you suggested. Will let you know the progress. – Williams Apr 19 '21 at 02:19
  • @Williams you're using the wrong class. `WebClient` is an obsolete class used by desktop applications back when only one or two calls were made at a time. Use `HttpClient` and `async/await` instead. There's no reason to use `ThreadPool.UnsafeQueueUserWorkItem` in any supported .NET version. I could even say `in any .NET version released in the last 10 years` – Panagiotis Kanavos Apr 20 '21 at 13:09

0 Answers0