0

I have a set of 100 Tasks that need to run, in any order. Putting them all into a Task.WhenAll() tends to overload the back end, which I do not control.

I'd like to run n-number tasks at a time, after each completes, then run the next set. I wrote this code, but the "Console(Running..." is printed to the screen all after the tasks are run making me think all the Tasks are being run.

How can I force the system to really "wait" for each group of Tasks?

//Run some X at a time
int howManytoRunAtATimeSoWeDontOverload = 4;
for(int i = 0; i < tasks.Count; i++)
{
    var startIndex = howManytoRunAtATimeSoWeDontOverload * i;
    Console.WriteLine($"Running {startIndex} to {startIndex+ howManytoRunAtATimeSoWeDontOverload}");

    var toDo = tasks.Skip(startIndex).Take(howManytoRunAtATimeSoWeDontOverload).ToArray();
    if (toDo.Length == 0) break;
    await Task.WhenAll(toDo);
}

Screen Output:

Screen Output

ejderuby
  • 710
  • 5
  • 21
Ian Vink
  • 66,960
  • 104
  • 341
  • 555
  • 1
    You have 100 tasks running and you don't know in what order they will complete. WhenAll doesn't start the tasks it just waits for them to finish. It sounds like what you really want is to throttle the starts. Batch the starts in groups and wait on that group to complete before starting the next group. – Dweeberly Aug 09 '19 at 17:01
  • Related: [How to limit the amount of concurrent async I/O operations?](https://stackoverflow.com/questions/10806951/how-to-limit-the-amount-of-concurrent-async-i-o-operations) – Theodor Zoulias Aug 09 '19 at 19:46

1 Answers1

2

There are a lot of ways to do this but I would probably use some library or framework that provides a higher level abstraction like TPL Dataflow: https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/dataflow-task-parallel-library (if your using .NET Core there's a newer library).

This makes it a lot easier than building your own buffering mechanisms. Below is a very simple example but you can configure it differently and do a lot more with this library. In the example below I don't batch them but I make sure no more than 10 tasks are processed at the same time.

        var buffer = new ActionBlock<Task>(async t =>
        {
            await t;
        }, new ExecutionDataflowBlockOptions { BoundedCapacity = 10, MaxDegreeOfParallelism = 1 });

        foreach (var t in tasks)
        {
            await buffer.SendAsync(DummyFunctionAsync(t));
        }

        buffer.Complete();
        await buffer.Completion;
vdL
  • 263
  • 1
  • 8
  • Upvoted, but changed my mind. `foreach (var t in tasks)` implies that there is a list of tasks that have already started, and your code justs awaits them in batches, making no real difference compared to awaiting them one-by-one, or all at once. Maybe you intended to write `foreach (var i in items)`, but in this case it makes more sense to create the tasks inside the action and throttle with `MaxDegreeOfParallelism` ([example](https://stackoverflow.com/a/32048327/11178549)), than to create the tasks during `SendAsync` and throttle with `BoundedCapacity`. – Theodor Zoulias Aug 09 '19 at 20:00
  • @TheodorZoulias: You have a valid point but I don't know much about the source of the tasks in the original question (threadpool or some IO operation). So I decided to go with this example. My main point is that he should look for some higher level abstractions like the ones that TPL Dataflow provides. – vdL Aug 09 '19 at 20:24
  • To be honest your example ensures that the tasks will be created sequentially (assuming that the `DummyFunctionAsync` is a task-factory function), which may be desirable in some scenarios. Moving the task creation inside the action will cause the tasks to be created concurrently, raising concerns about thread-safety inside `DummyFunctionAsync`. So your code may be preferable. – Theodor Zoulias Aug 09 '19 at 22:16