I am aware of the bottleneck for parallel programs, including the memory access speed limit that causes multithread programs to run slower than our expected running time, or even slower than its sequential version. I was wondering if the reverse can be possible (explained as follows).
Specifically, I was wondering if there is a case, where a multithread program with k
threads runs more than k
times faster than the sequential version of the program on the same machine. Let's say that the sequential version of the program takes 100 seconds to do the task, while the multithread version using 5 threads takes 10 seconds to finish.
I assume that both programs have the same algorithms, the same data-structures, the same implementation, and the same compiler with the same optimization options.
One possible case could be the hardware which can perform better for multithread programs, but I am not aware of such hardware (the question is whether there exist any). The other case could be the software implementation in lower levels. For example the compiler having a better optimization for the multithread version, but does such case exist in real compilers?
Edit: One indication for this answer is the AMD that is said to perform better on multithread tasks, rather than single thread tasks. But how is it done? (Difference between intel and AMD multithreading)