4

I am using the following pattern to perform a large number of operations (potentially millions)

var allTasks = new List<Task>();
var throttler = new SemaphoreSlim(initialCount: 8);

foreach (var file in filesToUpload)
{
    await throttler.WaitAsync();

    allTasks.Add(
        Task.Run(async () =>
        {
            try
            {
                await UploadFileAsync(file)
            }
            finally
            {
                throttler.Release();
            }
        }));
}

await Task.WhenAll(allTasks);

However I'm concerned about accumulating a huge number of Task objects in the allTasks collection. From some diagnostic runs, I seemed to have built up about 1Gb of memory used for ~100k Task objects.

Is there any change that can be made to the pattern above to phase out finished tasks, but still retain the throttling effect of the overall pattern?

The only thing that I can think of myself is partitioning / batching the overall dataset so that the above code only ever operates on, e.g. 1000 elements. Is that the most appropriate approach here?

UPDATE

So, based on your advice Henk, I've implemented the following;

var uploadFileBlock = new ActionBlock<string>(async file =>
{
    await UploadFileAsync(file)
}, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 8 });

foreach (var file in filePaths)
{
    await uploadFileBlock.SendAsync(file);
}

uploadFileBlock.Completion.Wait();

This seems to work fine, and there's a relatively low memory profile the entire time. Does this implementation look OK to you?

Chris McAtackney
  • 5,192
  • 8
  • 45
  • 69
  • Not a concrete answer but a) the Task.WhenAll() keeps everything in memory and b) you do a Task.Run(async void ...) to do some async I/O. Using an unnecessary thread. – H H Sep 24 '17 at 09:49
  • The [consensus](https://stackoverflow.com/a/11565531/60761) is to use TPL Dataflow . Also read the comments below that answer. – H H Sep 24 '17 at 09:54
  • 1
    The code keeps so many tasks in memory because of the Add() call. Even though 100K - 8 of those tasks are already complete, not very useful. No Clear() visible either. Instead of WhenAll, consider to count the tasks with the CountdownEvent class. – Hans Passant Sep 24 '17 at 14:53
  • @HenkHolterman I've updated the question with a code sample using TPL DataFlow - does it look like appropriate usage? Thanks for the pointers so far, much appreciated. – Chris McAtackney Sep 26 '17 at 20:55
  • I'm not an expert on DataFlow but it looks OK. I would test how this handles errors, like insert an invalid filename. Also, when you are using HttpClient, see my comment under [this answer](https://stackoverflow.com/a/46343790/60761) – H H Sep 27 '17 at 06:40
  • Ok, thanks for the advice - I'll take a look at all that. – Chris McAtackney Sep 27 '17 at 16:17
  • It looks like it will wait for the incoming messages infinitely without `uploadFileBlock.Complete()` – SerjG Apr 04 '18 at 13:31

1 Answers1

-2

This is very similar to another recent SO questions. As with that question, an approach that might work (although I have not tested it myself) would be:

private async Task Test()
{
  var allTasks = new List<Task>();
  foreach (var file in filesToUpload)
  {
    await WaitList(allTasks, 1000);
    allTasks.Add(UploadFileAsync(file));
  }
  await Task.WhenAll(allTasks);
}

private async Task WaitList(IList<Task> tasks, int maxSize)
{
  while (tasks.Count > maxSize)
  {
    var completed = await Task.WhenAny(tasks).ConfigureAwait(false);
    tasks.Remove(completed);
  }
} 

Not only will batching like this help with memory, but it could help keep you from creating an inadvertent denial of service attach.

Other approaches might leverage the producer/consumer pattern using .Net classes such as a BlockingCollection

erdomke
  • 4,980
  • 1
  • 24
  • 30