We detected a weird problem when running a parallel GroupBy on a system with high amount of cores.
We're running this on .Net Framework 4.7.2.
The (simplified) code:
public static void Main()
{
//int MAX_THREADS = Environment.ProcessorCount - 2;
//ThreadPool.SetMinThreads(1, 1);
//ThreadPool.SetMaxThreads(MAX_THREADS, MAX_THREADS);
var elements = new List<ElementInfo>();
for (int i = 0; i < 250000; i++)
elements.Add(new ElementInfo() { Name = "123", Description = "456" });
using (var cancellationTokenSrc = new CancellationTokenSource())
{
var cancellationToken = cancellationTokenSrc.Token;
var dummy = elements.AsParallel()
.WithCancellation(cancellationToken)
.Select(x => new { Name = x.Name })
.GroupBy(x => "abc")
.ToDictionary(g => g.Key, g => g.ToList());
}
}
public class ElementInfo
{
public string Name { get; set; }
public string Description { get; set; }
}
This code is running in an application that is already using about 100 threads. Running this on a "normal" pc (12 or 16 cores), it runs very fast (less than 1 second). Running this on a PC with a high amount of cores (48), it runs very slow (20 seconds).
Taking a dump during the 20 second delay, I see the threads running this LINQ are all waiting in HashRepartitionEnumerator.MoveNext().
There's a m_barrier.Wait(), so I think it is waiting there. It seems to wait on m_barrier, which is set to the number of partitions.
My guess is the following:
The number of partitions is set to the number of cores (48 in this case).
A number of threads are started in the thread pool, but the thread pool is full, so new threads need to be started. This happens at 1 thread per second.
While the threadpool is spinning up threads, all threads already running this LINQ query, are waiting until enough threads are started.
Only when enough threads are started, the LINQ query can finish.
Uncommenting the first lines in the Main method supports this thesis: By limiting the number of threads, the desired amount of threads is never reached, so this LINQ query never finishes.
Does this seem like a bug in .Net Framework, or am I doing something wrong?
Note: the real LINQ query has a few CPU-intensive Where-clauses, which makes it ideal to run in parallel. I removed this code as it isn't needed to reproduce the issue.