I'm faced with parallelizing an algorithm which in its serial implementation examines the six faces of a cube of array locations within a much larger three dimensional array. (That is, select an array element, and then define a cube or cuboid around that element 'n' elements distant in x, y, and z, bounded by the bounds of the array.
Each work unit looks something like this (Fortran pseudocode; the serial algorithm is in Fortran):
do n1=nlo,nhi
do o1=olo,ohi
if (somecondition(n1,o1) .eq. .TRUE.) then
retval =.TRUE.
RETURN
endif
end do
end do
Or C pseudocode:
for (n1=nlo,n1<=nhi,n++) {
for (o1=olo,o1<=ohi,o++) {
if(somecondition(n1,o1)!=0) {
return (bool)true;
}
}
}
There are six work units like this in the total algorithm, where the 'lo' and 'hi' values generally range between 10 and 300.
What I think would be best would be to schedule six or more threads of execution, round-robin if there aren't that many CPU cores, ideally with the loops executing in parallel, with the goal the same as the serial algorithm: somecondition()
becomes True
, execution among all the threads must immediately stop and a value of True
set in a shared location.
What techniques exist in a Windows compiler to facilitate parallelizing tasks like this? Obviously, I need a master thread which waits on a semaphore or the completion of the worker threads, so there is a need for nesting and signaling, but my experience with OpenMP is introductory at this point.
Are there message passing mechanisms in OpenMP?
EDIT: If the highest difference between "nlo" and "nhi" or "olo" and "ohi" is eight to ten, that would imply no more than 64 to 100 iterations for this nested loop, and no more than 384 to 600 iterations for the six work units together. Based on that, is it worth parallelizing at all?