Reduction in OpenMP fortran leads to weird results

Question

I have the following fortran code modified from https://computing.llnl.gov/tutorials/openMP/exercise.html

   PROGRAM REDUCTION

   INTEGER I, J, N

   REAL A(100), B(100), SUM

   REAL SUM2(2)

! Some initializations

   N = 100

   DO I = 1, N

    A(I) = I *1.0

    B(I) = A(I)

   ENDDO


   numthreads = 2


   call omp_set_num_threads(numthreads)

   !$omp parallel

   SUM = 0.0

   !$OMP DO private(I,J) !!REDUCTION(+:SUM2)

   DO J=1,2

    SUM2(J) = 0.0

    DO I = 1, N

     !!SUM = SUM + (A(I) * B(I))

     SUM2(J) = SUM2(J) + (A(I) * B(I))

    ENDDO

   ENDDO 

   !$omp end do

   !$omp end parallel

   PRINT *, ' Sum(1) = ', SUM2(1)

   END

Once I remove the comment symbols !! before REDUCTION(+:SUM2), then by gfortran -fopenmp to compile it, it leads to random results, e.g., Sum(1) = 338364.812, Sum(1) = -1.32860411E+15, though sometimes right one Sum(1) = 338350.000.

In other words, if I keep the comment symbols !! of reduction, i.e., no reduction, the result will be the same as no openmp.

Why reduction does not do a good job here? Do I need reduction in this example?

Are you saying that _with_ the reduction clause you get the unexpected results, or is it _without_ the reduction clause that you do? — francescalus, Jan 14 '21 at 18:10
with reduction, it leads to unexpected results. I edited my question. — Cougars, Jan 14 '21 at 18:18
About to start a chess tournament so no time, but using reduction is wrong. There is no need to reduce as each element of sum2 is written to only by one thread, and so there is no race condition to protect which is what reduction does. However with reduction parts of the thread local version of sum2 will not be initialised. Hence when you do the sum at the end you are adding in uninitialized values and can get any result. Try adding -finit-real=snan to your compiler flags. Will write answer tomorrow if nobody else does. — Ian Bush, Jan 14 '21 at 18:33
The linked question has a slightly broader remit, but coupled with Ian Bush's comment the answer to your problem should be clear from it: the reduction clause is wrong because each thread attacks different array element. — francescalus, Jan 14 '21 at 18:57
@IanBush that's not correct. The private copies of the reduction variable **are initialised** with the zero value of the reduction operator (`0.0` for `+`) The problem is that the reduced value is then added to the original variable and that one is **uninitialised**. — Hristo Iliev, Jan 15 '21 at 19:55
Put `SUM2 = 0.0` before the parallel region and your code will produce the correct result with and without the reduction clause. — Hristo Iliev, Jan 15 '21 at 19:56
@HristoIliev Yes, that is what I meant, but in my rush screwed it up slightly. Thank you. — Ian Bush, Jan 15 '21 at 19:56
While there is no semantic reason to have `reduction` here, there is a very good performance reason. `SUM2` is two elements long and there is 15/16 (94%) chance that both elements will end up in the same cache line. If that happens and if the inner loop is not optimised to hold the intermediate value in a register, the code execution will suffer from *false sharing* and it may perform way worse than the single-threaded version. With the `reduction` there, each thread works with a private copy of `SUM2` and no false sharing occurs. — Hristo Iliev, Jan 15 '21 at 20:08
What version of gfortran are you using? I think this problem should not occurs in modern gfortran. — Noureddine, Jan 17 '21 at 13:12

Reduction in OpenMP fortran leads to weird results

0 Answers0