I'm having trouble with OpenMP
on macOs with the default compiler and libomp
for OpenMP features.
More precisely, when I trace the CPU usage of the following code, I would expect to have two cores at 100% (independent sections and threads).
On Linux, it is perfectly fine and I do have a 200% CPU usage for my process.
On Mac, I have a very strange behavior traced out by Instruments: there is a peak at the very beginning with ~1000% CPU usage (my computer is a bi-proc 6-Core Intel Xeon E5 ~12 cores) with a lot of threads that are created, then a plateau at 200% (as expected).
The thing is that the peak/warmup completely kills my performances when I iterate over parallel sections.
Has anyone an explanation for that?
edits: I have updated the code to clarify the problem and add a snapshot of the execution behavior: (warmup at ~1000%, plateau at 200%, a smaller one at 100% --function2 is slightly slower than function1--, and it is repeated twice).
#include <iostream>
#include <algorithm>
#include <vector>
#include <cmath>
auto N= 5000;
std::vector<int> vA(N),vB(N);
void function_1()
{
for (int k = 0; k != 3; k++)
{
std::cout << "Function 1 (k = " << k << ")" << std::endl;
for(auto i=0; i < N ; ++i)
for(auto j=0; j < N ; ++j)
vA[j] += i+cos(j); //Doing something meaningless
}
}
void function_2()
{
for (int k = 0; k != 4; k++)
{
std::cout << "Function 2 " << "(k = " << k << ")" << std::endl;
for(auto i=0; i < N ; ++i)
for(auto j=0; j < N ; ++j)
vB[j] += i+sin(j); //Doing something meaningless
}
}
int main()
{
for(auto y = 0; y < 2 ; ++y)
{
#pragma omp parallel sections
{
#pragma omp section
function_1();
#pragma omp section
function_2();
}
}
return 0;
}