Consider the following minimal C code example. When compiling and executing with export OMP_NUM_THREADS=4 && gcc -fopenmp minimal2.c && ./a.out
(homebrew GCC 5.2.0 on OS X 10.11), this usually produces the correct behavior, i.e. seven lines with the same number. But sometimes, this happens:
[ ] bsum=1.893293142303100e+03
[1] asum=1.893293142303100e+03
[2] asum=1.893293142303100e+03
[0] asum=1.893293142303100e+03
[3] asum=3.786586284606200e+03
[ ] bsum=1.893293142303100e+03
[ ] asum=3.786586284606200e+03
equal: 0
It looks like a race condition, but my code seems fine to me. What am I doing wrong?
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#ifdef _OPENMP
#include <omp.h>
#define ID omp_get_thread_num()
#else
#define ID 0
#endif
#define N 1400
double a[N];
double verify() {
int i;
double bsum = 0.0;
for (i = 0; i < N; i++) {
bsum += a[i] * a[i];
}
fprintf(stderr, "[ ] bsum=%.15e\n", bsum);
return bsum;
}
int main(int argc, char *argv[]) {
int i;
double asum = 0.0, bsum;
srand((unsigned int)time(NULL));
//srand(1445167001); // fails on my machine
for (i = 0; i < N; i++) {
a[i] = 2 * (double)rand()/(double)RAND_MAX;
}
bsum = verify();
#pragma omp parallel shared(asum)
{
#pragma omp for reduction(+: asum)
for (i = 0; i < N; i++) {
asum += a[i] * a[i];
}
fprintf(stderr, "[%d] asum=%.15e\n", ID, asum);
}
bsum = verify();
fprintf(stderr, "[ ] asum=%.15e\n", asum);
return 0;
}
EDIT: Gilles brought to my attention that the errors beginning at the 15th significant digit are normal as I overestimated the precision of a double. I also cannot reproduce the faulty behavior with 2x the correct number on the Debian machine, so this might be homebrew gcc or Mac related.
I had a problem with a similar issue here, but the two do not seem to be related (at least in my eyes), so I started this as a separate question.