Does WhenAll add unnecessary delay? Also, what is causing the discrete spikes in execution time (C#/.NET)

Question

What is causing these discrete spikes in execution time when waiting for Tasks to complete? And why is using WhenAll slower than just looping really quickly and checking if all the tasks are complete? This is a simplified example, but was created because we saw seemingly unnecessary delay when calling WhenAll. NativeAOT doesn't seem to be as effected, so maybe some unnecessary JITing?

using System.Diagnostics;
namespace ConsoleApp1
{
    internal class Program
    {
        static async Task Main()
        {
            for (int i = 1; i <= 100; i+=1) 
            {
                await RunTest(i);
            }
        }

        public static async Task RunTest(int count) 
        {
            var sw = Stopwatch.StartNew();
            var tasks = new List<Task>();

            // Construct started tasks
            for (int i = 0; i < count; i++)
            {
                tasks.Add(Task.Run(() => Thread.Sleep(250)));
            }

            // Test 1, WhenAll
            //await Task.WhenAll(tasks);

            // Test 2, 10ms loop
            bool completed = false;
            while (!completed)
            {
                await Task.Delay(10);
                completed = tasks.All(t => t.IsCompleted);
            }

            Console.WriteLine($"{count},{sw.Elapsed.TotalSeconds}");
        }
    }
}

The data above was all collected from running a self-contained exe from command line. But when run in VS it doesn't seem to be a garbage collection issue, which I suspected, since the VS diagnostics don't show any garbage collection marks.

EDIT: Once putting real load on the CPU and not sleeping, the differences and discrete bumps disappeared.

There is an excellent tool for micro-benchmarking : [Benchmark.NET](https://benchmarkdotnet.org/articles/overview.html) that I would recommend for such side-by-side comparisons — alexm, Jun 14 '22 at 21:08
Could you add the line `ThreadPool.SetMinThreads(500, 500);` at the start of the program, and see if it makes any difference? — Theodor Zoulias, Jun 14 '22 at 22:09
@TheodorZoulias That worked! With 500 it never slowed down. I then tried 90 and sure enough the time went up from .25 to .50 right at 90 threads. So I guess the answer is that the algorithmic thread pool was waiting to create more threads and depending on how many threads were available changed how many threads needed to waited for to finish before the next thread could run. Thanks! That explains the discrete steps up, but I still don't understand why the WhenAll is worse. — Beau Gosse, Jun 14 '22 at 23:24

JohanP · Answer 1 · 2022-06-15T01:27:40.163

Thread.Sleep is a blocking operation. You are pushing a lot of work to be performed on the threadpool but each thread is then sitting for 250ms doing nothing. IIRC, your threadpool starts off with a total amount of threads based on the amount of cores your machine has but don't quote me on that. If for argument's sake you have a 4 core machine, your pool only has 4 threads. You are pushing hundreds of pieces of work to be performed on these 4 threads but then you block them. The runtime is seeing this pressure build up and as @Theodor Zoulias has said, more threads are getting created and added to the pool at a rate of 1 per second, which is not a lot. If you change

tasks.Add(Task.Run(() => Thread.Sleep(250)));

to

tasks.Add(Task.Run(() => Task.Delay(250)));

or better yet

tasks.Add(Task.Delay(250));

You will almost certainly see your issues disappear.

As to why Task.WhenAll is slower, you can have a look at the source for it, it does a BUNCH of work, it doesn't just check a simple property.

Thanks, I can confirm `tasks.Add(Task.Delay(250));` does solve the problem as it's no longer blocking the threads. — Beau Gosse, Jun 15 '22 at 19:42

score 2 · Answer 2 · answered Jun 15 '22 at 02:21

The other answers have covered thread pool starvation. But why do the Task.Delay / Task.WhenAll graphs look different?

Every time you call RunTest you add another task. If all the tasks can run in parallel, RunTest will return in 250ms. If they can't, then the scheduled tasks will execute in batches and RunTest will return in n * 250ms. Since as others have noted, the thread pool will only add one new thread every second, you would expect the result to stabilise when n ~= 4.

When a task completes, any continuations will be run immediately on the same thread. Task.WhenAll adds a continuation to every task, decrementing a counter when each completes. On the last task, it will collect the results and complete it's own task. So it doesn't add any pressure to the thread pool.

However, running an additional task every 10ms does add pressure to the thread pool. This extra task would ensure that the thread pool is considered busy even there are otherwise k-1 threads busy. Making it much more likely that a new thread will always be created every second. Changing the behaviour slightly. However, when the thread pool is busy, it's likely that the timer will only execute when a batch of tasks complete, or during the last batch of tasks.

Thank you! Based on your answer, I gathered that the test is really just forcing thread pool starvation that the 10ms check accidentally alleviates by creating MORE pressure on the thread pool. This gave me the idea to run the tests again, but instead of Thread.Sleeping, I actually ran some CPU cycles (checking values of DateTime.Now). When the CPU was actually under load running work on the threads there was no difference between WhenAll and the 10ms check. Also the discrete bumps disappeared and instead time just grew lineally. I updated the original question with the updated graph. — Beau Gosse, Jun 15 '22 at 20:00
Sleeping threads don't keep the CPU busy as the OS can swap them out. Such an ideal workload does make it easy to describe the graph. Obviously you only have a finite number of real cores which the OS tries to balance. The result is much harder to predict. — Jeremy Lakeman, Jun 15 '22 at 23:58

score 1 · Answer 3 · answered Jun 15 '22 at 00:04

Apparently you are observing the effects of a saturated ThreadPool. When the ThreadPool is saturated, its behavior is to accommodate the demand by spawning new threads at a rate of one new thread per second¹. It seems that the Task.WhenAll is affected by the starvation more severely than the pulling technique while (!completed), for reasons that I am honestly not in position to explain in details.

The moral lesson is that saturating the ThreadPool is a bad situation, and should be avoided. If your application has an insatiable desire for threads, threads and more threads, you could consider creating dedicated threads for each LongRunning operation, instead of borrowing them from the ThreadPool. The ThreadPool is intended as a small pool of reusable threads, to help amortize the cost of running frequent and lightweight operations like callbacks, continuations, event handers etc.

¹ _{This is the .NET 6 behavior, and it's not documented. It can be observed experimentally, but it might change in future .NET versions.}

Does WhenAll add unnecessary delay? Also, what is causing the discrete spikes in execution time (C#/.NET)

3 Answers3

Linked