1

I am using Parallel.For to start a large number of jobs (say 1000). This works well, however each job is also quite memory intensive, and from what I can tell Prallel For starts a much higher number of parallel jobs that I would expect.

Running on old home my dev box with 4 cores, I see 400+ ongoing jobs:

Seeing up to 411 ongoing jobs

This might be fine, however each of these jobs is running a relatively memory intensive algorithm. Therefore the memory usage of the program is high, and I suspect the performance is now impeded due to memory swapping

enter image description here

Currently I am not using any ParallelOptions, just running with the defaults. But I am wondering if I should be adjusting MaxDegreeOfParallelism to keep memory usage from exploding. Or am I overthinking this, and Parallel already takes something way smarter in account?

David Božjak
  • 16,887
  • 18
  • 67
  • 98
  • Interesting side note, I spent quite some time chasing down a memory leak, as I was assuming Parallel.For will always be running a steady number of jobs at any one time (say 8). While I found some minor stuff, there wasn't a "smoking gun" memory leak, the usage was just much larger because of the astounding number of parallel jobs. – David Božjak Apr 17 '23 at 09:55
  • 2
    Why are you asking us? You seem to be in a prime position to test things and adjust them as you see fit -- unlike everyone else with neither your code nor your desired performance characteristics. – Jeroen Mostert Apr 17 '23 at 09:58
  • 1
    @JeroenMostert very good point. I do expect to test it yes, but as I didn't find much info on the inner workings of the Parallel library and was hoping to hear if it does/doesn't take memory limitations into concern. – David Božjak Apr 17 '23 at 10:01
  • @DavidBožjak do you have `await`'s in your jobs? – Guru Stron Apr 17 '23 at 10:02
  • _"I didn't find much info on the inner workings of the Parallel library"_ - How about [source.dot.net](https://source.dot.net/#System.Threading.Tasks.Parallel/System/Threading/Tasks/Parallel.cs,17370e197c0598e6) ? – Fildor Apr 17 '23 at 10:09
  • @DavidBožjak. i don't know either, but probably the TPL will also take the current memory into account when creating tasks. But that doesn't help you because the memory usage will change dynamically later. Can't you optimize the memory management of your code(for example: use singletons where possible)? But in general you're right, you should use `MaxDegreeOfParallelism`, analyse which value is best for your usecase. – Tim Schmelter Apr 17 '23 at 10:13
  • Just reporting back, I do see vast perf improvements when limiting MaxDegreeOfParallelism. I guess I will need to spend time fine-tuning (which I was hoping I will be able to avoid by using Parallel.For) but in a way an answer is clear – David Božjak Apr 17 '23 at 10:51
  • Do your work uses any type of IO operation in your work? I believe that Parallel.For should work fairly well for CPU-limited operations by default. But if there is any type of blocking operation (IO, locks, sleep etc), the scheduler will notice that the CPU is available, and start additional work. – JonasH Apr 17 '23 at 11:27

2 Answers2

3

But I am wondering if I should be adjusting MaxDegreeOfParallelism to keep memory usage from exploding. Or am I overthinking this, and Parallel already takes something way smarter in account?

If you don't provide the ParallelOptions then default ones will be used which have MaxDegreeOfParallelism set to -1, i.e.:

If it is -1, there is no limit on the number of concurrently running operations (with the exception of the ForEachAsync method, where -1 means ProcessorCount).

So parallel the limitations will be provided by task scheduler used, and if none provided the default one (TaskScheduler.Default) will just post everything to the thread pool which can allocate up to ThreadPool.GetAvailableThreads(out int workerThreads, out int completionPortThreads) threads and AFAIK it considers mainly available CPUs, threads and CPU load (see this answer) and memory would not be taken into account at least directly (though in case of extreme memory usage GC can consume a lot of CPU affecting the monitored resources).

So in short - you will need to test your actual workloads and adjust accordingly.

Guru Stron
  • 102,774
  • 10
  • 95
  • 132
  • Thank you for the answer. I am a bit confused about the default behavior here, since it definitely does not limit the parallelism to processor count (4 on the machine in question) and is instead much much larger than that (500+). But I have verified that by providing the MaxDegreeOfParallelism it does limit it to that value – David Božjak Apr 17 '23 at 10:54
  • 1
    *"AFAIK it considers mainly available CPUs, threads and CPU load."* -- I am skeptical to the claim that the CPU load has anything to do with the `ThreadPool` availability. Most likely it's a myth. I've never seen an experimental proof of this claim. – Theodor Zoulias Apr 17 '23 at 11:29
0

Yes, you should definitely limit the MaxDegreeOfParallelism to a reasonable value like Environment.ProcessorCount, not only because of memory usage considerations but also because most likely you want a consistent behavior across subsequent Parallel executions. The default MaxDegreeOfParallelism is -1, which means unlimited parallelism, and in practice saturates the ThreadPool. A saturated ThreadPool creates one new thread every second, which means that the effective degree of parallelism of the Parallel operation increases over time. After the completion of the Parallel loop the ThreadPool is no longer saturated, and starts terminating superfluous threads at about the same rate (1/sec). So the effective degree of parallelism of the next Parallel operation will depend on the duration of the previous Parallel operation, and on how much time has passed after the completion of the previous Parallel operation. Basing the behavior of your program on such random non-deterministic factors is unlikely to be your intention.

I have posted here an experimental demonstration that an unconfigured Parallel execution uses all the available ThreadPool threads, and keeps asking for more. You could also take a look at this answer, for a more detailed description of the inner workings of the Parallel class in respect to the MaxDegreeOfParallelism option.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104