The Parallel.ForEach
method is not well suited for I/O-bound operations, because it requires a thread for each parallel workflow, and threads are not cheap resources. You can make it work by increasing the number of threads that the ThreadPool
creates immediately on demand, with the SetMinThreads
method, but that's not as efficient as using asynchronous programming and async/await. With asynchronous programming a thread is not required while the file is downloaded, or while the file is saved in the disc, so it is possible to download dozens of files concurrently using only a handful of threads.
Using the Partitioner
for creating ranges is a useful technique when parallelizing extremely granular (lightweight) workloads, like adding or comparing numbers. In your case the workload is quite coarse (chunky), so using ranges is more likely to slow things down than speed them up. Using ranges prevents balancing the workload, in case some files take longer to download than others.
My suggestion is to use the Parallel.ForEachAsync
method (introduced in .NET 6), which is designed specifically for parallelizing asynchronous I/O operations. Here is how you can use this method in order to download the files in parallel, with a specific degree of parallelism, and cancellation support:
private static readonly string _baseUrlPattern =
"http://url.com/Handlers/Image.ashx?imageid={0}&type=image";
private static readonly HttpClient _httpClient = new HttpClient();
internal static void DownloadAllMissingPictures(
IEnumerable<ListObject> imagesToDownload, string imageFolderPath,
CancellationToken cancellationToken = default)
{
var parallelOptions = new ParallelOptions()
{
MaxDegreeOfParallelism = 10,
CancellationToken = cancellationToken,
};
Parallel.ForEachAsync(imagesToDownload, parallelOptions, async (image, ct) =>
{
string imageId = image.ImageId;
string url = String.Format(_baseUrlPattern, imageId);
string filePath = Path.Combine(imageFolderPath, imageId);
using HttpResponseMessage response = await _httpClient.GetAsync(url, ct);
response.EnsureSuccessStatusCode();
using FileStream fileStream = File.OpenWrite(filePath);
await response.Content.CopyToAsync(fileStream);
}).Wait();
}
The Parallel.ForEachAsync
method returns a Task
. It's recommended that Task
s are await
ed, but taking into account that you are probably not familiar with asynchronous programming yet, let's just Wait
it instead for the time being.
In case the implementation above does not improve the performance of the whole procedure, you could experiment with the MaxDegreeOfParallelism
configuration, and also with the settings mentioned in this question: How to increase the outgoing HTTP requests quota in .NET Core?