3

Consider the following minimal C code example. When compiling and executing with export OMP_NUM_THREADS=4 && gcc -fopenmp minimal.c && ./a.out (GCC 4.9.2 on Debian 8), this produces five lines with rho=100 (sometimes also 200 or 400) on my machine. Expected output is of course rho=400 for all five printed lines.

The program is more likely to produce the correct result if I insert more code at // MARKER or place a barrier just there. But even with another barrier, it sometimes fails and so does my program. So the problem seems to be that a is not properly initialized when going into the reduction loop.

The OpenMP 4.0.0 manual even states on page 55 that there is an implicit barrier at the end of a loop construct unless a nowait clause is specified. So a should be set up at this point. What is going wrong here? Am I missing something?

#include <stdio.h>
#ifdef _OPENMP
#include <omp.h>
#define ID omp_get_thread_num()
#else
#define ID 0
#endif

double a[100];

int main(int argc, char *argv[]) {
    int i;
    double rho;
    #pragma omp parallel
    {
        #pragma omp for
        for (i = 0; i < 100; i++) {
            a[i] = 2;
        }
        // MARKER
        rho = 0.0;
        #pragma omp for reduction(+: rho)
        for (i = 0; i < 100; i++) {
            rho += ((a[i])*(a[i]));
        }
        fprintf(stderr, "[%d] rho=%f\n", ID, rho);
    }
    fprintf(stderr, "[%d] rho=%f\n", ID, rho);
    return 0;
}
CrypticDots
  • 314
  • 1
  • 8
  • [clang seems to produce the correct result](http://melpon.org/wandbox/permlink/Jmkv5VcXKQJ4sMNN) – m.s. Oct 17 '15 at 13:25
  • Do you need a mutex when modifying `rho`? – donjuedo Oct 17 '15 at 13:41
  • I don't think so. Page 167 of the manual says: *The reduction clause specifies a reduction-identifier and one or more list items. For each list item, a private copy is created in each implicit task or SIMD lane, and is initialized with the initializer value of the reduction-identifier. After the end of the region, the original list item is updated with the values of the private copies using the combiner associated with the reduction-identifier.* – CrypticDots Oct 17 '15 at 13:43
  • The code looks fine to me. However, Just in case, I would explicitly declare `private(i)` at the `parallel` directive's level. I know it is supposed to be privatised, but... Anyway, this suspiciously looks like a compiler bug to me, like if you hadn't the implicit `barrier` at the end of the `omp for`. Maybe adding some explicit ones might help too. – Gilles Oct 17 '15 at 14:30
  • @Gilles Adding an explicit barrier at MARKER does not help (still fails sometimes). Declaring `i` as private does not change the strange behavior and is unnecessary. – CrypticDots Oct 17 '15 at 14:34
  • 1
    now I'm puzzled: I reproduced the issue with GCC 4.9.2, but also (admittedly far less often) with Intel compiler 16.0.0. I still believe the code is correct, but that shakes my conviction. – Gilles Oct 17 '15 at 15:29

1 Answers1

3

OK I've got the answer, but I sweat to get it...

This is a race condition due to the fact that rho is shared and that you initialise it inside the parallel region like this rho = 0.0;

Either initialising it outside of the parallel region, or using a #pragma omp single right before will fix the code...

Gilles
  • 9,269
  • 4
  • 34
  • 53
  • I think you're right although this is hard to verify. At least the problem disappears when moving `rho = 0.0;` or placing a barrier after that statement. So, accepted. Time for round number 2: http://stackoverflow.com/questions/33190809/why-does-openmp-fail-to-sum-these-numbers – CrypticDots Oct 17 '15 at 19:42
  • Good observation! If the OP had not changed her/his scalar code and instead just used `#pragma omp parallel for` twice this would not have happened. – Z boson Oct 19 '15 at 09:14
  • Yes you are right. I did this because I thought it is more efficient because the program only needs to branch out once. Most likely wrong though since threads are only created once and sleeping if no work has to be done and because of compiler optimization. – CrypticDots Oct 21 '15 at 11:52