c++ std::async : faster on 4 cores compared to 8 cores

Question

I have 16000 jobs to perform.

Each job is independent. There is no shared memory, no interprocess communication, no lock or mutex.

I am on ubuntu 16.06. c++11. Intel® Core™ i7-8550U CPU @ 1.80GHz × 8

I use std::async to split jobs between cores.

If I split the jobs into 8 (2000 per core), computation time is 145. If I split the jobs into 4 (4000 per core), computation time is 60.

Output after reduce is the same in both case.

If I monitor the CPU during computation (just using htop), things happen as expected (8 cores are used at 100% in first case, only 4 cores are used 100% in second case).

I am very confused why 4 cores would process much faster than 8.

Well, you are doing it wrong. One way to do it wrong is to wait for the slowest thread to get done. Instead break up the computational job in smaller packets so they can be processed on whatever thread is ready next. Look at the standard producer-consumer algorithm. — Hans Passant, Dec 27 '17 at 09:42

Lior Kogan · Accepted Answer · 2017-12-27T09:37:20.620

The i7-8550U has 4 cores and 8 threads.

What is the difference? Quoting How-To Geek:

Hyper-threading was Intel’s first attempt to bring parallel computation to consumer PCs. It debuted on desktop CPUs with the Pentium 4 HT back in 2002. The Pentium 4’s of the day featured just a single CPU core, so it could really only perform one task at a time—even if it was able to switch between tasks quickly enough that it seemed like multitasking. Hyper-threading attempted to make up for that.

A single physical CPU core with hyper-threading appears as two logical CPUs to an operating system. The CPU is still a single CPU, so it’s a little bit of a cheat. While the operating system sees two CPUs for each core, the actual CPU hardware only has a single set of execution resources for each core. The CPU pretends it has more cores than it does, and it uses its own logic to speed up program execution. In other words, the operating system is tricked into seeing two CPUs for each actual CPU core.

Hyper-threading allows the two logical CPU cores to share physical execution resources. This can speed things up somewhat—if one virtual CPU is stalled and waiting, the other virtual CPU can borrow its execution resources. Hyper-threading can help speed your system up, but it’s nowhere near as good as having actual additional cores.

By splitting the jobs to more cores than available - you are paying a big penalty.

To make sure I follow: "Hyper-threading can help speed your system up, but it’s nowhere near as good as having actual additional cores." should be extended with "[,,,] and for your own c++ code, it will actually slow things up" (or it depends how this code is structured) ? And we have to go read only specs of the CPU rather than believing the OS. — Vince, Dec 27 '17 at 09:55
If each of your software-threads requires 100% processor time (no waits) the optimal number of threads equals to the number of cores. Otherwise, the optimal number of threads can be much higher. You should query your OS for the number of cores instead of the number of logical processors. — Lior Kogan, Dec 27 '17 at 10:27

c++ std::async : faster on 4 cores compared to 8 cores

1 Answers1

Linked