I have about a million urls pointing to HTML pages on a public web server that I want to save to my disk. Each of these is about the same size, ~30 kilobytes. My url lists are split about evenly in 20 folders on disk, so for simplicity I create one Task per folder, and in each task I download one URL after the other, sequentially. So that gives me about 20 parallel requests at any time. I'm on a relatively crappy DSL, 5mbps connection.
This represents several gigabytes of data so I'm expecting the process to take several hours, but I'm wondering if I could make the approach any more efficient. Is it likely I'm making the most out of my connection? How can I measure that? Is 20 parallel downloads a good number or should I dial up or down?
The language is F#, I'm using WebClient.DownloadFile for every url, one WebClient per task.
==================================
EDIT: One thing that made a huge difference was adding a certain header to the request:
let webClient = new WebClient()
webClient.Headers.Add(HttpRequestHeader.AcceptEncoding, "gzip,deflate")
This cut the size of downloads from about 32k to 9k, resulting in enormous speed gains and disk space savings. Thanks to TerryE for mentioning it!