How does the Parallel class dynamically adjust the level of parallelism?

Question

What feedback does TPL use to dynamically adjust the number of worker threads?

My previous understanding was that it measures the rate of task completion to see if adding or removing threads is worth it. But then, why does this code keep increasing the number of threads, even though there is a bottleneck introduced by a semaphore?

Surely, there can be no more than 20 task completions per second, and more than 2 threads will not improve that.

var activeThreads = 0;
var semaphore = new SemaphoreSlim(2);
var res = Parallel.For(0, 1_000_000, i =>
{
    Interlocked.Increment(ref activeThreads);
    semaphore.Wait();
    try
    {
        Thread.Sleep(100);
        Console.WriteLine("Threads: " + activeThreads);
    }
    finally
    {
        Interlocked.Decrement(ref activeThreads);
        semaphore.Release();
    }
});

*...it measures the rate of task completion to see if adding or removing threads is worth it* I don't believe this is true. How would it make such a decision? "worth it" is very subjective — Liam, Jul 24 '19 at 11:58
I've never seen a semaphore released within the thread itself. Maybe that's causing some odd behavior? — Blue Eyed Behemoth, Jul 24 '19 at 11:58
@Liam Basically, by trying to add another thread and measuring if the rate of task completion increased. If it decreases, drop a thread. If it increased, try another one. Then measure again and repeat. — relatively_random, Jul 24 '19 at 12:00
Yeah, it doesn't do that. This will create a load of threads but then block all but two of them. If you only want two threads you should specify the [`MaxDegreeOfParallelism`](https://stackoverflow.com/questions/15931346/how-to-configure-a-maximum-number-of-threads-in-a-parallel-for). `Parallel.For` isn't as clever as you think it is. — Liam, Jul 24 '19 at 12:00
@BlueEyedBehemoth I'm just using it as a mutex-for-two here. — relatively_random, Jul 24 '19 at 12:01
Here's the [source code](https://referencesource.microsoft.com/#mscorlib/system/threading/tasks/Parallel.cs,ceb354fb1788c81a) that actually executes the loop, you'll see no fancy monitoring going on here. It's not aware of your `SemaphoreSlim` at all — Liam, Jul 24 '19 at 12:03
@Liam Then why does it take it so long to ramp up the thread count? There are 6 cores free on my machine. They should be able to spawn hundreds of threads pretty quickly. — relatively_random, Jul 24 '19 at 12:08
There are some [hard limits](https://stackoverflow.com/questions/145312/maximum-number-of-threads-in-a-net-app) as well — Liam, Jul 24 '19 at 12:13
I'm watching the output with my own eyes. It starts with 8 or 9 threads, then keeps adding one every second or so. — relatively_random, Jul 24 '19 at 12:13
`But then, why does this code keep increasing the number of threads, even though there is a bottleneck introduced by a semaphore?` You have effectively written something that is very slow (from `Parallel`'s point of view). It can't **see** why it is slow - but it knows you want it done in parallel. It can't make it go any faster, so it does the only thing it can - throw threads at the problem. — mjwills, Jul 24 '19 at 12:15
@Liam That doesn't make any sense. The fact that all of the threads except two are blocked means that they are not using the CPU at all. The thread pool has plenty of computing power to spawn hundreds of threads instantly. And yet it doesn't do that. There must be something smarter than just "spawn all the threads" going on. — relatively_random, Jul 24 '19 at 12:15
@Liam Then why doesn't it spawn all one million workers I seem to have requested? We seem to be repeating ourselves. — relatively_random, Jul 24 '19 at 12:17
That is a different question @relatively_random, and a common point of confusion. The thread pool is conservative about how many threads it spins up in a window of time. Which is why it doesn't spin up a million of them at once. And that is putting aside the fact that if it did that it wouldn't be much of a **pool**. :) If you want to "speed up" the initial ramp up of the thread pool, call https://learn.microsoft.com/en-us/dotnet/api/system.threading.threadpool.setminthreads?view=netframework-4.8 . — mjwills, Jul 24 '19 at 12:18
Yes we are repeating ourselves. Read what I've already written, there is nothing else to add. — Liam, Jul 24 '19 at 12:18
Give https://stackoverflow.com/questions/53188758/c-sharp-system-timers-timer-odd-behavior and https://stackoverflow.com/a/10298760/34092 a read. It isn't the same _problem_ - but it discusses a little how the thread pool is acting. Which is part of what is surprising you. — mjwills, Jul 24 '19 at 12:22
I suspect your issue here is that you are assuming that `Parallel` has the smarts (working out when to ask for new `Task`s etc). In my understanding, `Parallel` is relatively dumb. It is the **thread pool** that has the smarts. `Parallel` can ask for as many as it wants, and the **thread pool** will control how many it actually **gets** at a time (and how many new ones it spins up etc etc). _I mention this because your argument as to why other people are wrong might be that you aren't clear on where `Parallel`'s responsibilities end, and the thread pool's responsibilities begin._ — mjwills, Jul 24 '19 at 12:24
@mjwills You're right, I'm just adding to the confusion. I'll tone down the `Parallel` references. — relatively_random, Jul 24 '19 at 13:00
@mjwills Reverted, posted the new question: https://stackoverflow.com/questions/57183938/how-does-the-tpl-dynamically-adjust-the-level-of-parallelism — relatively_random, Jul 24 '19 at 13:20
This seems like an artificial example that is missing its key component: the computation to be performed by the task. Why are you using a semaphore? What resource do you only have 2 of? Why are you faking work with a Sleep? You should share the _actual_ problem you are trying to solve. Your example is the problem at this point because it does nothing. What do you want to do with `i`? — Wyck, Jul 24 '19 at 15:10

score 1 · Answer 1 · answered Jul 24 '19 at 12:43

I believe the ParallelOptions is what you are looking for to specify the amount of parallelism.

 Parallel.For(0, 1000, new ParallelOptions
        {
            MaxDegreeOfParallelism = 2
        }, i => { Console.WriteLine(i); });

Personally, I think the TPL library will work in a lot of cases, but it isn't really smart about execution distribution (pardon my english). Whenever you have bottlenecks in the execution of your application, have a look at the pipeline pattern for example. Here is a link that describes different approaches to parallel execution very well imo: https://www.dotnetcurry.com/patterns-practices/1407/producer-consumer-pattern-dotnet-csharp

Wyck · Answer 2 · 2019-07-24T13:32:07.930

TL;DR: The thing that you are doing in your code that the TPL uses to justify creating a new thread is blocking. (Synchronizing or sleeping, or performing I/O would all count as blocking.)

A longer explanation...

When your task runs, it takes its thread hostage for 100 ms (because you Sleep(100)). While you are sleeping, that thread cannot be used to run other tasks because it would risk not being in a runnable state when the sleep time period expires. Typically we sleep rather than perform an asynchronous action because we need to keep our call stack intact. We are therefore relying on the stack to maintain our state. And the stack is a one-of-a-kind resource for the thread. (There's not actually a lot more to a thread than its stack.)

So the TPL (Thread pool, specifically) tries to keep occupancy high but the thread count low. One way it achieves this is by making sure that there are approximately just as many runnable threads in the system as there are virtual processors. Each time it needs to increase the thread count, it must create a relatively expensive stack for the thread, so it's best not to have so many. And a thread that is not runnable cannot be scheduled, so when the CPU becomes free, you need something to schedule to make use of the processing resources available. If the thread is sleeping, it cannot be scheduled to run. So instead, a thread will be added to the thread pool and the next task will be scheduled on it.

When you are writing parallel code like this (as in your parallel for loop) that can be partitioned and managed by the TPL you should be careful about putting your thread into a non-runnable state. Performing synchronous I/O, waiting for a synchronization object (e.g. semaphore, event or mutex etc.), or sleeping will put the thread into a state where nothing else can be done on the thread until the I/O completes, the sleep interval expires, or the synchronization object becomes signalled. The thread is no good to the TPL during this period.

In your case, you do several of these things: you wait on a semaphore, you sleep, and you perform I/O by writing to the console. The first thing is waiting on that semaphore. If it's not signalled, then you immediately have the situation where the thread is not runnable and the next task of your million-or-so tasks that need to be run must be scheduled on a different thread. If there isn't one, then the TPL can justify creating a new thread to get more tasks started. After-all, what if it's thread #987,321 that will actually wind up setting the semaphore to unblock task #1? The TPL doesn't know what your code does, so it can delay creating threads for a while in the spirit of efficiency, but for correctness, ultimately it will have to create more threads to start chipping away at the task list. There is a complex, implementation-specific heuristic that it applies to monitor, predict and otherwise get this efficiency guess right.

Now your specific question actually asked what feedback does it use to adjust the number of threads. Like I said, the actual implementation is complex and you should probably think of it as a black-box. But in a nutshell, if there are no runnable threads, it may create another thread to keep chipping away at the task list (or may wait a while before doing so, hoping that things will free up), and if there are too many idle threads, it will terminate the idle threads to reclaim their resources.

And to reiterate, as I said at the top, and to hopefully answer your question this time, the one thing you do that allows the TPL to justify creating a new thread is to block. ...even on that first semaphore.

I didn't think it would matter whether I blocked or spinned, but apparently it does. Replacing the semaphore and sleep with `Interlocked.CompareExchange` loops and only writing to console when the maximum increases seems to have limited it to ~10 threads. — relatively_random, Jul 24 '19 at 14:36

score 0 · Accepted Answer · answered Aug 01 '19 at 11:43

Ran into an article analysing the thread injection algorithm in 2017. As of 2019-08-01, the hillclimbing.cpp file on GitHub hasn't really changed so the article should still be up to date.

Relevant details:

The .NET thread pool has two main mechanisms for injecting threads: a starvation-avoidance mechanism that adds worker threads if it sees no progress being made on queued items and a hill-climbing heuristic that tries to maximize throughput while using as few threads as possible.

...

It calculates the desired number of threads based on the ‘current throughput’, which is the ‘# of tasks completed’ (numCompletions) during the current time-period (sampleDuration in seconds).

...

It also takes the current thread count (currentThreadCount) into consideration.

...

The real .NET Thread Pool only increases the thread-count by one thread every 500 milliseconds. It keeps doing this until the ‘# of threads’ has reached the amount that the hill-climbing algorithm suggests.

...

The [hill-climbing] algorithm only returns values that respect the limits specified by ThreadPool.SetMinThreads(..) and ThreadPool.SetMaxThreads(..)

...

In addition, [the hill-climbing algorithm] will only recommend increasing the thread count if the CPU Utilization is below 95%

So it turns out the thread pool does have a feedback mechanism based on task completion rate. It also doesn't explicitly check whether its threads are blocked or running, but it does keep an eye on overall CPU utilization to detect blockages. All this also means it should be roughly aware of what the other threads and processes are doing.

On the other hand, it will always eagerly spawn at least as many threads as told by ThreadPool.SetMinThreads(), which defaults to the number of logical processors on the machine.

In conclusion, the test code in question was doing two things which make it keep piling up more threads:

there are lots of tasks queued up and sitting in the queue for ages, which indicates starvation
CPU utilization is negligible, which means that a new thread should be able to use it

How does the Parallel class dynamically adjust the level of parallelism?

3 Answers3

A longer explanation...