I am using the following pattern to perform a large number of operations (potentially millions)
var allTasks = new List<Task>();
var throttler = new SemaphoreSlim(initialCount: 8);
foreach (var file in filesToUpload)
{
await throttler.WaitAsync();
allTasks.Add(
Task.Run(async () =>
{
try
{
await UploadFileAsync(file)
}
finally
{
throttler.Release();
}
}));
}
await Task.WhenAll(allTasks);
However I'm concerned about accumulating a huge number of Task
objects in the allTasks
collection. From some diagnostic runs, I seemed to have built up about 1Gb of memory used for ~100k Task
objects.
Is there any change that can be made to the pattern above to phase out finished tasks, but still retain the throttling effect of the overall pattern?
The only thing that I can think of myself is partitioning / batching the overall dataset so that the above code only ever operates on, e.g. 1000 elements. Is that the most appropriate approach here?
UPDATE
So, based on your advice Henk, I've implemented the following;
var uploadFileBlock = new ActionBlock<string>(async file =>
{
await UploadFileAsync(file)
}, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 8 });
foreach (var file in filePaths)
{
await uploadFileBlock.SendAsync(file);
}
uploadFileBlock.Completion.Wait();
This seems to work fine, and there's a relatively low memory profile the entire time. Does this implementation look OK to you?