Consider the following minimal C code example. When compiling and executing with export OMP_NUM_THREADS=4 && gcc -fopenmp minimal.c && ./a.out
(GCC 4.9.2 on Debian 8), this produces five lines with rho=100
(sometimes also 200 or 400) on my machine. Expected output is of course rho=400
for all five printed lines.
The program is more likely to produce the correct result if I insert more code at // MARKER
or place a barrier just there. But even with another barrier, it sometimes fails and so does my program. So the problem seems to be that a
is not properly initialized when going into the reduction loop.
The OpenMP 4.0.0 manual even states on page 55 that there is an implicit barrier at the end of a loop construct unless a nowait clause is specified. So a
should be set up at this point. What is going wrong here? Am I missing something?
#include <stdio.h>
#ifdef _OPENMP
#include <omp.h>
#define ID omp_get_thread_num()
#else
#define ID 0
#endif
double a[100];
int main(int argc, char *argv[]) {
int i;
double rho;
#pragma omp parallel
{
#pragma omp for
for (i = 0; i < 100; i++) {
a[i] = 2;
}
// MARKER
rho = 0.0;
#pragma omp for reduction(+: rho)
for (i = 0; i < 100; i++) {
rho += ((a[i])*(a[i]));
}
fprintf(stderr, "[%d] rho=%f\n", ID, rho);
}
fprintf(stderr, "[%d] rho=%f\n", ID, rho);
return 0;
}