Multiple Parallel.ForEach loops in .Net

Question

In a .Net process, there is only one managed thread pool. We can set the minimum and maximum thread count as needed via public properties.

In .Net, we also have Parallel.ForEach that gets its threads from this managed thread pool under the hood.

In Parallel.ForEach we can also set the MaxDegreeOfParallelism to limit the maximum number of threads.

I have two Parallel.ForEach running in parrallel. One has MaxDegreeOfParallelism set to 3 and the other has set to 7.

My question is: Does both my Parallel.ForEach loops use the same thread pool under the hood. If yes, how does Parallel.ForEach limits the threads with MaxDegreeOfParallelism. How multiple Parallel.ForEach loops and one managed thread pool work together? It'll really help if you can provide a high level explanation or some pointers before I peak into the .net core source code.

The question doesn't really matter (yes, it's the same). `Parallel.ForEach` uses *all available cores* to process tons of data so a nested `Parallel.ForEach` won't find any free core to use. That's not a bug. `Parallel.ForEach` is meant for *data parallelism* - processing a lot of in-memory data by partitioning it and using a separate worker thread/task to process each partition. — Panagiotis Kanavos, Feb 18 '21 at 13:37
What are you trying to do and why do you have nested parallel loops? Either the code is wrong, or the wrong construct is used for the job. For example, `Parallel.ForEach` has no support for async operations because it makes no sense for data parallelism. There are other classes for throttled concurrent or async processing, like `ActionBlock` — Panagiotis Kanavos, Feb 18 '21 at 13:39
@TheodorZoulias doesn't matter either `I have two parallel.ForEach running in parrallel`. Those are still going to use all the cores they can. Perhaps with careful configuration one can make sure the CPU isn't saturated, but *why*? Is this really a data parallelism problem? Or a misuse of `Parallel.ForEach` ? — Panagiotis Kanavos, Feb 18 '21 at 13:43
Does this change the answer in any way? `Parallel.ForEach` uses the *cores*, through worker tasks and the threadpool. One threadpool or more, it's going to try to keep using its worker tasks for as long as possible - unless configured to release the tasks after a while. So the specifics matter. The number of threadpools, not really — Panagiotis Kanavos, Feb 18 '21 at 14:00
@ShahryarRazzak it doesn't matter, `Parallel.ForEach` will keep the cores busy no matter how many threadpools there are. You can configure it to use different threadpools but that won't free up the cores. What problem are you trying to solve? — Panagiotis Kanavos, Feb 18 '21 at 14:06
Yes, Loops are NOT nested. They are running in parallel, side by side. They have different MaxDegreeOfParallelism values as mentioned in the question. You can assume there are 16 cores — Shahryar Razzak, Feb 18 '21 at 14:15
This type of discussion is only useful to the community if an [mre] is included to explain the relationship, between the parallel queries, lots of assumptions being made that wont be clear for new developers. — Chris Schaller, Jul 21 '21 at 07:29

score 5 · Answer 1 · answered Feb 18 '21 at 14:06

Does both my Parallel.ForEach loops use the same thread pool under the hood.

Yes
How does Parallel.ForEach limits the threads with MaxDegreeOfParallelism.

ParallelOptions.MaxDegreeOfParallelism Gets or sets the maximum number of concurrent tasks enabled by this ParallelOptions instance.

By default, methods on the Parallel class attempt to use all available processors, are non-cancelable, and target the default TaskScheduler (TaskScheduler.Default). ParallelOptions enables overriding these defaults.
How multiple Parallel.ForEach loops and one managed thread pool work together?

They share the same thread pool. As it is described here:

Generally, you do not need to modify this setting. However, you may choose to set it explicitly in advanced usage scenarios such as these:

When you're running multiple algorithms concurrently and want to manually define how much of the system each algorithm can utilize. You can set a MaxDegreeOfParallelism value for each.

Theodor Zoulias · Answer 2 · 2023-03-20T23:18:53.740

By default a Parallel.ForEach loop uses threads from the ThreadPool, which is a static class and it's only one per process. It is possible to modify this behavior by configuring the TaskScheduler property of the ParallelOptions. Creating a custom TaskScheduler that functions as an alternative ThreadPool is not exactly trivial, but not rocket science either. An implementation can be found here. If you want to learn more about custom task schedulers, you can read this article by Stephen Toub (code).

Now what happens when two parallel loops are running concurrently, is that both are scheduling work on ThreadPool threads. If they are both configured with a specific MaxDegreeOfParallelism, and the sum of both does not exceed the minimum number of threads that the ThreadPool creates on demand¹, then the two loops are not going to interfere with each other regarding their scheduling. Of course it is still possible to compete with each other for CPU resources, in case these are scarce. In that case the operating system will be the arbiter.

In case at least one of the parallel loops is not configured with a specific MaxDegreeOfParallelism, the effective default of this option is -1, which means unbounded parallelism. This will cause the ThreadPool to become immediately saturated, and to remain saturated until the source enumerable of the unconfigured parallel loop completes. During this period the two parallel loops will interfere heavily with each other, and who is going to get the extra thread that the saturated ThreadPool will inject every ~1,000 msec is a matter of who asked for it first. On top of that a saturated ThreadPool affects negatively any other independent callbacks, timer events, async continuations etc that may also be active during this period.

In case both parallel loops are configured, and the sum MaxDegreeOfParallelism of both exceeds the number of available threads, then it's a similar situation as previous. The only difference is that gradually the number of threads in the ThreadPool will increase, and the saturation incident may end up earlier than the execution of the parallel loops.

Below is an example that demonstrates this behavior:

ThreadPool.SetMinThreads(4, 4);
Task[] tasks = new[] { 'A', 'B' }.Select(name => Task.Run(() =>
{
    Thread.Sleep(100); if (name == 'B') Thread.Sleep(500);
    Print($"{name}-Starting");
    var options = new ParallelOptions() { MaxDegreeOfParallelism = 10 };
    Parallel.ForEach(Enumerable.Range(1, 10), options, item =>
    {
        Print($"{name}-Processing #{item}");
        Thread.Sleep(1000);
    });
    Print($"{name}-Finished");
})).ToArray();
Task.WaitAll(tasks);

static void Print(string line)
{
    Console.WriteLine($@"{DateTime.Now:HH:mm:ss.fff} [{Thread.CurrentThread
        .ManagedThreadId}] > {line}");
}

Output:

15:34:20.054 [4] > A-Starting
15:34:20.133 [6] > A-Processing #2
15:34:20.133 [7] > A-Processing #3
15:34:20.133 [4] > A-Processing #1
15:34:20.552 [5] > B-Starting
15:34:20.553 [5] > B-Processing #1
15:34:20.956 [8] > A-Processing #4
15:34:21.133 [4] > A-Processing #5
15:34:21.133 [7] > A-Processing #6
15:34:21.133 [6] > A-Processing #7
15:34:21.553 [5] > B-Processing #2
15:34:21.957 [8] > A-Processing #8
15:34:21.957 [9] > A-Processing #9
15:34:22.133 [4] > A-Processing #10
15:34:22.134 [7] > B-Processing #3
15:34:22.134 [6] > B-Processing #4
15:34:22.553 [5] > B-Processing #5
15:34:22.957 [8] > B-Processing #6
15:34:22.958 [9] > B-Processing #7
15:34:23.134 [4] > A-Finished
15:34:23.134 [4] > B-Processing #8
15:34:23.135 [7] > B-Processing #9
15:34:23.135 [6] > B-Processing #10
15:34:24.135 [5] > B-Finished

(Try it on Fiddle)

You can see that the parallel loop A utilizes initially 3 threads (the threads 4, 6 and 7), while the parallel loop B utilizes only the thread 5. At that point the ThreadPool is saturated. Around 500 msec later the new thread 8 is injected, and is taken by the A loop. The B loop still has only one thread. Another second later one more thread, the thread 9, is injected. This too goes for the loop A, setting the score at 5-1 in favor of the loop A. There is no politeness or courtesy in this battle. It's a wild competition for limited resources. If you expect to have more than one parallel loops running in parallel, make sure that all have their MaxDegreeOfParallelism option configured, and that the ThreadPool can create enough threads on demand to accommodate all of them.

¹ _{It is configured by the method ThreadPool.SetMinThreads, and AFAIK by default is equal to Environment.ProcessorCount.}

Note: The above text describes the existing behavior of the static Parallel class (.NET 5). Parallelism achieved through PLINQ (the AsParallel LINQ operator) has not the same behavior in all aspects. Also in the future the Parallel class may get new methods with different defaults.

.NET 6 update: The above example produces now a different output. The score ends up being only 3-2 in favor of the loop A:

04:34:47.894 [4] > A-Starting
04:34:47.926 [8] > A-Processing #1
04:34:47.926 [7] > A-Processing #2
04:34:47.926 [4] > A-Processing #3
04:34:48.392 [6] > B-Starting
04:34:48.393 [6] > B-Processing #1
04:34:48.792 [9] > B-Processing #2
04:34:48.927 [4] > A-Processing #4
04:34:48.927 [8] > A-Processing #5
04:34:48.927 [7] > A-Processing #6
04:34:49.393 [6] > B-Processing #3
04:34:49.792 [9] > B-Processing #4
04:34:49.927 [4] > A-Processing #7
04:34:49.927 [8] > A-Processing #8
04:34:49.928 [7] > A-Processing #9
04:34:50.393 [6] > B-Processing #5
04:34:50.792 [9] > B-Processing #6
04:34:50.927 [4] > A-Processing #10
04:34:50.928 [8] > B-Processing #8
04:34:50.928 [7] > B-Processing #7
04:34:51.393 [6] > B-Processing #9
04:34:51.928 [4] > A-Finished
04:34:52.393 [6] > B-Processing #10
04:34:53.394 [6] > B-Finished

The injected thread 9 is taken by the loop B instead of the loop A. It seems that the behavior of the Parallel class, or the ThreadPool, or both, has changed slightly in .NET 6. But I am not sure what exactly the changes are.

Multiple Parallel.ForEach loops in .Net

2 Answers2

Linked