0

Why does the the CPU usage per core decreases as I increase the degree of parallelism in the following simple Parallel.For loop? It ranges from 100% usage with 1 core, to around 40% with 20 cores (see image). I would expect that, since all variables are local, the cpu/core would always remain fairly constant.

public static void myTest()
{
    Parallel.For(0, 50,new ParallelOptions { MaxDegreeOfParallelism = 20 }, (kk) =>
    {
        List<int> mylist = new List<int>();
        while (true)
        {
            mylist.Add(1);
            if (mylist.Count > 500000)
            {
                mylist = new List<int>();
            }
        }
    });
}

Screenshot

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
SandiaDeDia
  • 291
  • 2
  • 9
  • 2
    Because you spending all your time switching contexts and smashing the task scheduler. In short the example is nonsensical, nothing should work like this. And you should be asking a question about the actual problem you are trying to solve – TheGeneral Nov 08 '21 at 00:22
  • My real problem is related with a minimum travel time algorithm where I am queuing lots of data and have a similar behavior. This is just the simplest working example I could find to post it here. – SandiaDeDia Nov 08 '21 at 00:27
  • 1
    Cpus have limited amounts of cores and threads, when you reach that limit you are spending excess time trying to time share between threads. The Task scheduler when left to its on devices will try and help you in this process, however if you give it less optimally planned work loads it will behave in less optimal ways. In this case, it will try to keep taking more threads from your thread pool, and instead of processing your workloads efficiently, its going to spend time trying to switch between your badly hewn jobs. In which case, this example is completely non sensical – TheGeneral Nov 08 '21 at 00:32
  • 1
    1. Does the same happens if you preallocate the space required for the data stored in the lists? (`new List(500000)`) 2. Does the same happens if the number of items stored in the lists is significantly smaller? (like 10 items for example) – Theodor Zoulias Nov 08 '21 at 00:39
  • From here its hard to help you, well apart from saying you should not be specifying to degree of parallelism and let it choose the default unless you know better, you should not be running time endless while loops that do nothing after certain period of time, and you should really be profiling the actual code, and showing the results of that actual processing. Quite likely this would fall in the realm of code review – TheGeneral Nov 08 '21 at 00:40
  • @TheGeneral I wouldn't advise not configuring the `MaxDegreeOfParallelism` of `Parallel.For`/`Parallel.ForEach` loops, unless your goal is to [saturate the `ThreadPool`](https://stackoverflow.com/questions/66261010/multiple-parallel-foreach-loops-in-net/66263583#66263583). – Theodor Zoulias Nov 08 '21 at 00:43
  • @TheodorZoulias decreasing the number of items does not affect, but preallocating the space solves the issue (all cores 100%). Unfortunately in the actual problem I use a priorityQueue and cannot preallocate the data. – SandiaDeDia Nov 08 '21 at 00:49
  • 1
    Hmmm. My wild guess is that the bottleneck is either 1. the garbage collector, who has to recycle all the garbage created by the constantly doubled array storage of the lists, or 2. the memory bandwidth of the machine is saturated due to the high demand for moving data around. – Theodor Zoulias Nov 08 '21 at 00:55
  • @TheodorZoulias sorry, preallocating does not work either. It only works if I create an array and the simply assign as mylist[idx]=1 instead of using the Add method. I will investigate more how parallelism works on C#. I have a very similar code in C++ and OpenMP and I do not have this issues. Thanks. – SandiaDeDia Nov 08 '21 at 00:59
  • You could also experiment with clearing the existing list (`mylist.Clear()`), instead of creating a new list each time. Btw which `PriorityQueue` implementation are you using? The [new class](https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.priorityqueue-2) introduced with .NET 6? – Theodor Zoulias Nov 08 '21 at 01:05
  • @TheodorZoulias yes i agree, the point was more geared towards letting the task scheduler heuristically determine what is needed in an actual real world situation. The problem is we know nothing about the specs of this system, and the actual processing nature of this job. All we know, is this a an unrealistic example with more issues than the titanic, any answer one way or another only solves how to write questionable solutions to questionable problems. my gut feeling is if we knew the true nature of the problem and the calculations the an appropriate solution can be tendered. :) – TheGeneral Nov 08 '21 at 01:20
  • 1
    ```mylist.clear()``` solves the issue !. I am targeting Net 5 and use a custom PriorityQueue from scratch. I will check the one in Net 6 which will be obviously better than the one I have. Thanks again. – SandiaDeDia Nov 08 '21 at 01:23
  • @TheGeneral yeah, the code in the question is quite unnatural. It would be simpler if the OP had just started 20 threads with the `Thread` constructor, instead of using the relatively higher lever `Parallel` class. This would eliminate any `ThreadPool`-related considerations. – Theodor Zoulias Nov 08 '21 at 01:33
  • @SandiaDeDia beyond the CPU utilization, you should probably also measure the performance of each alternative implementation. It's not impossible to increase the one while reducing the other. For example calling `List.Clear()` could be more expensive than instantiating a new `List` object. Don't know, just saying. – Theodor Zoulias Nov 08 '21 at 01:56

0 Answers0