Why is optimal thread count of a program related to number of cores when there are thousands of background threads

Question

I've been reading about multi-threaded programming and number of optimal threads. I understand that it is very subjective, varies case by case basis, and the real optimal can be found only through trial-and-error.

However, I've found so many posts saying that if the task is non-I/O-bound, then

Optimal: numberOf(threads) ~= numberOf(cores)

Please take a look at Optimal number of threads per core

Q) How can the above equation be valid if hundreds/thousands of background (OS/other stuff) threads are already fighting to get their turn?

Q) Doesn't having a bit more number of threads increase the probability of being allotted with a core?

The other background processes are not really "fighting" to get a turn, unless your CPU is pegged at 100% when idle at desktop. Which is also you don't usually need to worry about probability of being allotted a core - most of the time, you're not running more than one processor-intensive application. — Blorgbeard, Jul 02 '14 at 03:08
By raising the number of threads running in your program, assuming that all threads in the OS are equally sharing CPU time, you are basically trading throughput for responsiveness. Your program will be getting more time slices, but at the same time it will be making the OS consume more real execution time for managing them. Of course, with priorities, it may all be in vain. — didierc, Jul 02 '14 at 07:08

score 5 · Accepted Answer · answered Jul 02 '14 at 03:46

The "optimal" only applies to threads that are executing full throttle. The 1000+ threads you can see in use in, say, the Windows Task Manager are threads that are not executing. They are waiting for a notification, blocking on a synchronization object's wait() call.

Which includes I/O but can also be a timer, a driver event, a process interop synch object, an UI thread waiting for a message, etcetera. The latter are much less visible since they are usually wrapped by a friendly api.

Writing a program that has as many threads as the machine has cores, all burning 100% core, is not actually that common. You'd have to solve the kind of problem that requires pure calculation. Real programs are typically bogged down by the need to read/write the data to perform an operation or are throttled by the rate at which data arrives.

Overscheduling the processor is not a good strategy if you have threads burning 100% core. They'll start to fight with each other, the context switching overhead causes less work to be done. It is fine when they block. Blocking automatically makes a core available to do something else.

To clarify: Is the implicit assumption with the "1 thread per core" strategy that there will be a significant amount of time when all of my application threads are ready to run when all other threads from all other processes are blocked/waiting, and therefore all my threads will often get scheduled simultaneously on different cores? — John, Dec 29 '15 at 08:12
The general assumption is that you don't have to fight with threads owned by other processes. That is not terribly realistic, nor will anybody be surprised when it happens, you however can always configure a machine to get very close to that ideal. Just don't run those other processes. And you will when it is important, dedicating a machine to a single job is certainly not unusual in server-style apps for example. — Hans Passant, Dec 29 '15 at 08:37

Why is optimal thread count of a program related to number of cores when there are thousands of background threads

1 Answers1