0

I have a for-loop to iterate over a rather large amount of points (ca. 20000), for every point it is checked whether or not the point is inside some cylinder (that cylinder is the same for every point). Furthermore, I would like to have the highest Y coordinate from the set of points. Since I have to do this calculation a lot, and it's quite slow, I want to use OpenMP to parallelize the loop. Currently I have (somewhat reduced):

#pragma omp parallel for default(shared) private(reducedCloudSize, reducedCloud, cylinderBottom, cylinderTop) reduction(+:ptsInside, ptsInsideLarger)
for (int i = 0; i < reducedCloudSize; i++){
    highestYCoord = highestYCoord > testPt.y ? highestYCoord : testPt.y;

    if (CylTest_CapsFirst(cylinderBottom,cylinderTop,cylinderHeight*cylinderHeight,cylinderRadius*cylinderRadius,testPt) != -1){
        ptsInside++;
    }

}

Where the CylTest_CapsFirst will check whether the point is inside of the cylinder. However, this code does not work. If I leave out the reduction(+:ptsInside, ptsInsideLarger) part it actually works, but is much slower than the non-parallelized version. If I include the reduction clause, the program never even seems to enter the for-loop!

What am I doing wrong?

Thanks!

user1254962
  • 153
  • 5
  • 15
  • @you have a race condition writing to `highestYCoord`. Fix it with critical or atomic. You also might want to rethink what variables you want shared and private. You only write to `highestYCoord` and `ptsInside` in your loop. Those need to be shared. The others (except i) don't matter. Only i needs to be private (which it should be by construction). This assumes that `CylTest_CapsFirst` only read variables and does not write anything. – Z boson Feb 27 '14 at 18:46
  • @Zboson The poster could also use `reduction(max:highestYCoord)` to avoid using a critical or atomic, resulting in cleaner and possibly faster code. – pburka Feb 28 '14 at 02:57
  • @pburka, that depends on what version of OpenMP the OP has. Also, I agree it would be cleaner but I doubt it would be faster. I have not seen any evidence for that http://stackoverflow.com/questions/21603288/reduction-with-openmp-linear-merging-or-lognumber-of-threads-merging. – Z boson Feb 28 '14 at 07:09

1 Answers1

1

Assuming your function CylTest_CapsFirst does not write to anything (only reads) the only variables that need to be shared are highestYCoord and ptsInside. The only variable that needs to be private is i. You don't need to explictly declare these. But you do need to make sure that no thread writes to the shared variables at the same time. To do this efficiently you should make private versions of highestYCoord and ptsInside which you write in the parallel loop. Then you can merge the private versions with the shared version in a critical section. This is efficient as long as reducedCloudSize >> number_of_threads.

#pragma omp parallel
{
    double highestYCoord_private = highestYCoord;
    int ptsInside_private = 0;
    #pragma omp for
    for (int i = 0; i < reducedCloudSize; i++){
        highestYCoord_private = highestYCoord_private > testPt.y ? highestYCoord_private : testPt.y;
        if (CylTest_CapsFirst(cylinderBottom,cylinderTop,cylinderHeight*cylinderHeight,cylinderRadius*cylinderRadius,testPt) != -1) {
                ptsInside_private++;
        }
    }
    #pragma omp critical 
    {
        highestYCoord = highestYCoord_private > highestYCoord : highestYcoord_private ? highestYCoord
        ptsInside += ptsInside_private;
    }
}
Z boson
  • 32,619
  • 11
  • 123
  • 226