Is parallelism ever useful for an application that is always using all available CPU cores?

Question

Say I have a large list of integers and I need to iterate through them and double every single one. These are stateless/independent operations so I could easily split the workload across multiple threads/cores.

But what if I need to do this on a high traffic website?

Each request to the server spins up a new thread so all my server cores will always be busy processing incoming requests. In this case, since there are no cores without work, parallel processing would have no positive effect on performance, right?

Or am I missing/misunderstanding something here?

Maybe the answer depends on how soon you need the work to be finished. If finishing it quickly is a higher priority than promptly serving all of the active clients, then it still would make sense to do it in parallel. OTOH, if you need both to finish it quickly *AND* not delay any of the other traffic, then the real problem is, you need to add more hosts to your server farm. — Solomon Slow, Jan 08 '22 at 15:03
I guess my concern was if I have 8 cores total, 7 of them busy serving other requests and only 1 available to do the heavy data processing work, then even if I tried to execute that work in parallel, it would still end up being executed sequentially, since the 7 other cores are busy. — lobsteris, Jan 08 '22 at 19:13

score 0 · Answer 1 · edited Jan 09 '22 at 17:51

Q :
_{"... parallel processing would have no positive effect on performance, right?"}

A :
Given the above defined context, the deduction was right.

Details matter, yet some general principles ( as a Rule of Thumb ) hold -- No matter how high or low is the "surrounding" ( background ) workload of the just-[CONCURRENT]-ly scheduled resources ( pure-[SERIAL], round-robin alike, serving the O/S queue-of-waiting threads ( ignoring for the moment slight imbalance, coming from a relative priority re-ordering ) for mapping threads onto different CPU-cores plus additional, forced hopping of active threads from one CPU-core to another CPU-core due to actual thermal-throttling reasons, thus introduced increased CacheLine "depleted" hit-rate, resulting in 3 orders of magnitude worse RAM-I/O, compared to in-Cache computing -- are just few performance degrading phenomena to account for in considering real-world limits for trying to go for maximum performance ) -- all this reduces the net-effect of idealised scheduling. Plus, it adds add-on overheads to handling any additional stream-of-execution, adding more overheads, than without doing so.

A simple, interactive GUI-tool will say more, than any amount of text on this subject. Just tweak the overhead fraction and first test for a full, 100%-parallel problem fraction ( which will never happen in real world, even the loading of the process itself into RAM is a pure-[SERIAL] part of the overall computing Problem-under-review, isn't it? ). Once seeing the the "costs" of growing overheads accumulated, move the p-fraction from full-100% to some 0.997, 0.998, 0.997 ... to see the Speedup impact of smaller p-fraction ( parallel / ( parallel + serial ) )

If interested in more hardware & software ( having the same dilemma ) details, feel free to use this & this & this:

Is parallelism ever useful for an application that is always using all available CPU cores?

1 Answers1