this should be a fairly simple, but I'm running into an issue trying to run a basic nested for loop in OpenMP
for(z=start;z<=end;z++){
offset=sizeof(int)*(z*r*c);
fseek(fpIn,offset,SEEK_SET);
fread(tempbuffer,sizeof(int),r*c,fpIn);
#pragma omp parallel for collapse(2) private(x,y,z) schedule(static)
for(y=0;y<c;y++){
for(x=0;x<r;x++){
if(z>=z0 && z<z1 && y>=y0 && y<y1 && x>=x0 && x<x1){
volbuffer[y*c+x] = proc(tempbuffer[y*c+x]);
}
}
}
fseek(fpOut,offset,SEEK_SET);
fwrite(volbuffer,sizeof(int),r*c,fpOut);
}
where proc();
is a function that does some very basic arithmetic on the input value. However, it turns out to be super slow when ran. The #pragma
only affects a simple 1D array. The volbuffer and tempbuffer are the same size, I just read into temp to reduce a possibility of false sharing, yet this still scales very very poorly. What am I doing wrong here?
Here r and c are sides of a matrix. What I'm trying to do is edit each value of the matrix. proc(val) function only consists of
return val + 5
so it shouldn't take long.
Running it on a 9x9 matrix (r=c=9) I get the following benchmarks: Without OpenMP:
0.290381 seconds
0.287123 seconds
0.293081 seconds
0.298092 seconds
With OpenMP:
0.516495 seconds
0.511104 seconds
0.508267 seconds
0.521731 seconds
I'm using an i7 8550U if it is of significance