120

Given this code:

var arrayStrings = new string[1000];
Parallel.ForEach<string>(arrayStrings, someString =>
{
    DoSomething(someString);
});

Will all 1000 threads spawn almost simultaneously?

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
Jader Dias
  • 88,211
  • 155
  • 421
  • 625

6 Answers6

165

No, it won't start 1000 threads - yes, it will limit how many threads are used. Parallel Extensions uses an appropriate number of cores, based on how many you physically have and how many are already busy. It allocates work for each core and then uses a technique called work stealing to let each thread process its own queue efficiently and only need to do any expensive cross-thread access when it really needs to.

Have a look at the PFX Team Blog for loads of information about how it allocates work and all kinds of other topics.

Note that in some cases you can specify the degree of parallelism you want, too.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • 5
    I was using Parallel.ForEach(FilePathArray, path =>... to read about 24,000 files tonight creating one new file for each file I read in. Very simple code. It appears that even 6 threads was enough to overwhelm the 7200 RPM disk I was reading from at 100% utilization. Over the period of a few hours I watched the Parallel library spin off over 8,000 threads. I tested using MaxDegreeOfParallelism and sure enough the 8000+ threads disappeared. I have tested it multiple times now with the same result. – Jake Drew Jun 24 '16 at 06:40
  • It *could* start 1000 threads for some degenerate 'DoSomething'. (As in the case where I'm currently dealing with an issue in production code that failed to set a limit and spawned 200+ threads thereby popping the SQL connection pool.. I recommend setting the Max DOP for any work that cannot be trivially reasoned about as being explicitly CPU bound.) – user2864740 Apr 11 '17 at 23:31
  • Partitioner - https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent.partitioner?redirectedfrom=MSDN&view=netframework-4.8 – rafidheen Mar 24 '20 at 07:38
  • *"Parallel Extensions uses an appropriate number of cores, based on how many you physically have and how many are already busy."* -- Do you mean that the `Parallel.ForEach` takes into account the load imposed on the available cores, by other processes on the same machine? If yes, have you verified it experimentally? I would be highly skeptical to such a claim. – Theodor Zoulias Feb 06 '23 at 06:19
  • @TheodorZoulias: I honestly can't remember what I verified 13 years ago, I'm afraid. – Jon Skeet Feb 06 '23 at 07:08
  • OK, fair enough. I'll make the assumption that this answer contains information found in the PFX blog. Btw I am surprised that the link in the answer is still unbroken after 13+ years! – Theodor Zoulias Feb 06 '23 at 07:17
37

On a single core machine... Parallel.ForEach partitions (chunks) of the collection it's working on between a number of threads, but that number is calculated based on an algorithm that takes into account and appears to continually monitor the work done by the threads it's allocating to the ForEach. So if the body part of the ForEach calls out to long running IO-bound/blocking functions which would leave the thread waiting around, the algorithm will spawn up more threads and repartition the collection between them. If the threads complete quickly and don't block on IO threads for example, such as simply calculating some numbers, the algorithm will ramp up (or indeed down) the number of threads to a point where the algorithm considers optimum for throughput (average completion time of each iteration).

Basically the thread pool behind all the various Parallel library functions, will work out an optimum number of threads to use. The number of physical processor cores forms only part of the equation. There is NOT a simple one to one relationship between the number of cores and the number of threads spawned.

I don't find the documentation around the cancellation and handling of synchronizing threads very helpful. Hopefully MS can supply better examples in MSDN.

Don't forget, the body code must be written to run on multiple threads, along with all the usual thread safety considerations, the framework does not abstract that factor... yet.

user2864740
  • 60,010
  • 15
  • 145
  • 220
Microsoft Developer
  • 1,919
  • 1
  • 20
  • 27
  • 1
    "..if the body part of the ForEach calls out to long running blocking functions which would leave the thread waiting around, the algorithm will spawn up more threads.." - *In degenerate cases this means there might be as many threads created as allowed per ThreadPool.* – user2864740 Apr 12 '17 at 19:26
  • 3
    You are right, for IO it may allocate +100 threads as I debugged myself – FindOutIslamNow Aug 30 '18 at 10:29
13

Great question. In your example, the level of parallelization is pretty low even on a quad core processor, but with some waiting the level of parallelization can get quite high.

// Max concurrency: 5
[Test]
public void Memory_Operations()
{
    ConcurrentBag<int> monitor = new ConcurrentBag<int>();
    ConcurrentBag<int> monitorOut = new ConcurrentBag<int>();
    var arrayStrings = new string[1000];
    Parallel.ForEach<string>(arrayStrings, someString =>
    {
        monitor.Add(monitor.Count);
        monitor.TryTake(out int result);
        monitorOut.Add(result);
    });

    Console.WriteLine("Max concurrency: " + monitorOut.OrderByDescending(x => x).First());
}

Now look what happens when a waiting operation is added to simulate an HTTP request.

// Max concurrency: 34
[Test]
public void Waiting_Operations()
{
    ConcurrentBag<int> monitor = new ConcurrentBag<int>();
    ConcurrentBag<int> monitorOut = new ConcurrentBag<int>();
    var arrayStrings = new string[1000];
    Parallel.ForEach<string>(arrayStrings, someString =>
    {
        monitor.Add(monitor.Count);

        System.Threading.Thread.Sleep(1000);

        monitor.TryTake(out int result);
        monitorOut.Add(result);
    });

    Console.WriteLine("Max concurrency: " + monitorOut.OrderByDescending(x => x).First());
}

I haven't made any changes yet and the level of concurrency/parallelization has jumped drammatically. Concurrency can have its limit increased with ParallelOptions.MaxDegreeOfParallelism.

// Max concurrency: 43
[Test]
public void Test()
{
    ConcurrentBag<int> monitor = new ConcurrentBag<int>();
    ConcurrentBag<int> monitorOut = new ConcurrentBag<int>();
    var arrayStrings = new string[1000];
    var options = new ParallelOptions {MaxDegreeOfParallelism = int.MaxValue};
    Parallel.ForEach<string>(arrayStrings, options, someString =>
    {
        monitor.Add(monitor.Count);

        System.Threading.Thread.Sleep(1000);

        monitor.TryTake(out int result);
        monitorOut.Add(result);
    });

    Console.WriteLine("Max concurrency: " + monitorOut.OrderByDescending(x => x).First());
}

// Max concurrency: 391
[Test]
public void Test()
{
    ConcurrentBag<int> monitor = new ConcurrentBag<int>();
    ConcurrentBag<int> monitorOut = new ConcurrentBag<int>();
    var arrayStrings = new string[1000];
    var options = new ParallelOptions {MaxDegreeOfParallelism = int.MaxValue};
    Parallel.ForEach<string>(arrayStrings, options, someString =>
    {
        monitor.Add(monitor.Count);

        System.Threading.Thread.Sleep(100000);

        monitor.TryTake(out int result);
        monitorOut.Add(result);
    });

    Console.WriteLine("Max concurrency: " + monitorOut.OrderByDescending(x => x).First());
}

I reccommend setting ParallelOptions.MaxDegreeOfParallelism. It will not necessarily increase the number of threads in use, but it will ensure you only start a sane number of threads, which seems to be your concern.

Lastly to answer your question, no you will not get all threads to start at once. Use Parallel.Invoke if you are looking to invoke in parallel perfectly e.g. testing race conditions.

// 636462943623363344
// 636462943623363344
// 636462943623363344
// 636462943623363344
// 636462943623363344
// 636462943623368346
// 636462943623368346
// 636462943623373351
// 636462943623393364
// 636462943623393364
[Test]
public void Test()
{
    ConcurrentBag<string> monitor = new ConcurrentBag<string>();
    ConcurrentBag<string> monitorOut = new ConcurrentBag<string>();
    var arrayStrings = new string[1000];
    var options = new ParallelOptions {MaxDegreeOfParallelism = int.MaxValue};
    Parallel.ForEach<string>(arrayStrings, options, someString =>
    {
        monitor.Add(DateTime.UtcNow.Ticks.ToString());
        monitor.TryTake(out string result);
        monitorOut.Add(result);
    });

    var startTimes = monitorOut.OrderBy(x => x.ToString()).ToList();
    Console.WriteLine(string.Join(Environment.NewLine, startTimes.Take(10)));
}
Timothy Gonzalez
  • 1,802
  • 21
  • 18
4

It works out an optimal number of threads based on the number of processors/cores. They will not all spawn at once.

Colin Mackay
  • 18,736
  • 7
  • 61
  • 88
4

See Does Parallel.For use one Task per iteration? for an idea of a "mental model" to use. However the author does state that "At the end of the day, it’s important to remember that implementation details may change at any time."

Kevin Hakanson
  • 41,386
  • 23
  • 126
  • 155
0

Does Parallel.ForEach limit the number of active threads?

If you don't configure the Parallel.ForEach with a positive MaxDegreeOfParallelism, the answer is no. With the default configuration, and assuming a source sequence with sufficiently large size, the Parallel.ForEach will use all the threads that are immediately available in the ThreadPool, and will continuously ask for more. By itself the Parallel.ForEach imposes zero restrictions to the number of threads. It is only limited by the capabilities of the associated TaskScheduler.

So if you want to understand the behavior of an unconfigured Parallel.ForEach, you have to know the behavior of the ThreadPool. Which is simple enough so that it can be described in a single paragraph. The ThreadPool satisfies immediately all requests for work, by starting new threads, up to a soft limit which be default is Environment.ProcessorCount. Upon reaching this limit, further requests are queued, and new threads are created with a rate of one new thread per second to satisfy the demand¹. There is also a hard limit on the number of threads, which in my machine is 32,767 threads. The soft limit is configurable with the ThreadPool.SetMinThreads method. The ThreadPool also retires threads, in case there are too many and there is no queued work, at about the same rate (1 per second).

Below is an experimental demonstration that the Parallel.ForEach uses all the available threads in the ThreadPool. The number of available threads is configured up front with the ThreadPool.SetMinThreads, and then the Parallel.ForEach kicks in and takes all of them:

ThreadPool.SetMinThreads(workerThreads: 100, completionPortThreads: 10);
HashSet<Thread> threads = new();
int concurrency = 0;
int maxConcurrency = 0;
Parallel.ForEach(Enumerable.Range(1, 1500), n =>
{
    lock (threads) maxConcurrency = Math.Max(maxConcurrency, ++concurrency);
    lock (threads) threads.Add(Thread.CurrentThread);
    // Simulate a CPU-bound operation that takes 200 msec
    Stopwatch stopwatch = Stopwatch.StartNew();
    while (stopwatch.ElapsedMilliseconds < 200) { }
    lock (threads) --concurrency;
});
Console.WriteLine($"Number of unique threads: {threads.Count}");
Console.WriteLine($"Maximum concurrency: {maxConcurrency}");

Output (after waiting for ~5 seconds):

Number of unique threads: 102
Maximum concurrency: 102

Online demo.

The number of completionPortThreads is irrelevant for this test. The Parallel.ForEach uses the threads designated as "workerThreads". The 102 threads are:

  • The 100 ThreadPool threads that were created immediately on demand.
  • One more ThreadPool thread that was injected after 1 sec delay.
  • The main thread of the console application.

¹ The injection rate of the ThreadPool is not documented. As of .NET 7 it is one thread per second, but this could change in future .NET versions.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104