I wrote a simple code snippet, where the workload is very different for every thread. Some threads need to calculate several hundreds iterations and other threads need to do just one iteration to get the desired result:
for(int i=0; i<height; i++){
for(int j=0; j<width; j++){
complex<float> c((float)j/width-1.5,(float)i/height-0.5);
complex<float> z(0, 0);
int count =0;
while(abs (z) < 2 && count < MAX_IT){
z=z*z + c;
++count;
}
image[i][j]=count;
}
}
With lscpu
I check how many cores, threads per core and cores pro socket are available. Now I want to parallize this snippet with OpenMP aware of the CPU topology.
There is a possibility to define environment variables like
OMP_PLACES='threads(12)
OMP_PLACES='cores(4)'
OMP_PLACES='sockets(2)'
And there is the possibility of processor binding, like
#pragma omp parallel proc_bind(master|close|spread)
I cannot understand how to use them correctly (just with try and error). Does somebody has experience here?
Thank you