2

Consider the following minimal C code example. When compiling and executing with export OMP_NUM_THREADS=4 && gcc -fopenmp minimal2.c && ./a.out (homebrew GCC 5.2.0 on OS X 10.11), this usually produces the correct behavior, i.e. seven lines with the same number. But sometimes, this happens:

[ ] bsum=1.893293142303100e+03
[1] asum=1.893293142303100e+03
[2] asum=1.893293142303100e+03
[0] asum=1.893293142303100e+03
[3] asum=3.786586284606200e+03
[ ] bsum=1.893293142303100e+03
[ ] asum=3.786586284606200e+03
equal: 0

It looks like a race condition, but my code seems fine to me. What am I doing wrong?

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#ifdef _OPENMP
#include <omp.h>
#define ID omp_get_thread_num()
#else
#define ID 0
#endif
#define N 1400

double a[N];

double verify() {
    int i;
    double bsum = 0.0;
    for (i = 0; i < N; i++) {
        bsum += a[i] * a[i];
    }
    fprintf(stderr, "[ ] bsum=%.15e\n", bsum);
    return bsum;
}

int main(int argc, char *argv[]) {
    int i;
    double asum = 0.0, bsum;
    srand((unsigned int)time(NULL));
    //srand(1445167001); // fails on my machine
    for (i = 0; i < N; i++) {
        a[i] = 2 * (double)rand()/(double)RAND_MAX;
    }
    bsum = verify();
    #pragma omp parallel shared(asum)
    {
        #pragma omp for reduction(+: asum)
        for (i = 0; i < N; i++) {
            asum += a[i] * a[i];
        }
        fprintf(stderr, "[%d] asum=%.15e\n", ID, asum);
    }
    bsum = verify();
    fprintf(stderr, "[ ] asum=%.15e\n", asum);
    return 0;
}

EDIT: Gilles brought to my attention that the errors beginning at the 15th significant digit are normal as I overestimated the precision of a double. I also cannot reproduce the faulty behavior with 2x the correct number on the Debian machine, so this might be homebrew gcc or Mac related.

I had a problem with a similar issue here, but the two do not seem to be related (at least in my eyes), so I started this as a separate question.

Community
  • 1
  • 1
CrypticDots
  • 314
  • 1
  • 8
  • 1
    Aside: it will help you to debug by commenting out the line `srand((unsigned int)time(NULL));` so that each run is repeatable. If the default seed works, find a `time()` seed that doesn't and print the seed value so you can use it temporarily to track down the fault. – Weather Vane Oct 17 '15 at 19:48
  • `srand(782985985);` fails to add numbers correctly on my machines. – CrypticDots Oct 17 '15 at 19:51
  • 1
    Just a remark that the faulty value is twice the right one... – Gilles Oct 17 '15 at 19:56
  • @Gilles In this case yes, but not always. Sometimes they only differ beginning at the 15th digit after the dot. – CrypticDots Oct 17 '15 at 20:14
  • Why not use 'pragma omp parallel for reduction' in one single stanza? I'm not sure 'a' should be shared, and if you don't specify, and use it in the reduction, the compiler will do the right thing. – Jack Wasey Oct 17 '15 at 22:21
  • @JackWasey Using `pragma omp parallel for` or using it like I did does not make a difference (except less lines of output). `a` is automatically shared according to the OpenMP manual, also sharing `asum` is not really necessary. – CrypticDots Oct 17 '15 at 22:59
  • 1
    The difference after the 15th digit or so is perfectly normal, since the summation order changed between the sequential and parallel loop. The way errors are propagated is different, leading to slightly different results. So nothing to worry about (well nothing to worry more than in any other parallel algorithm). The output you showed however is totally different: this is genuinely worrying! Are you sure you can reproduce it? – Gilles Oct 18 '15 at 05:37
  • @Gilles Thank you for your valuable comment. You're right, `double` only has about 15 significant digits, so this is okay. Yes I can reproduce it, but only on my Mac with OS X 10.11 homebrew gcc 5.2.0. So this might be Mac or homebrew gcc related or just my machine. I edited the question to reflect that. – CrypticDots Oct 18 '15 at 11:40
  • I cannot reproduce this behavior on OSX 10.9 with homebrew gcc 5.2.0. Scary. – Chiel Oct 18 '15 at 11:51
  • Does your error always enter in the thread with the highest ID? – Chiel Oct 18 '15 at 11:57
  • With 2 threads, usually [1] is 2x the value. With 4 threads, [3] (sometimes also [2]) have 2x value. With 8 threads, it is random: e.g. [1][2][4] are correct, [0][3][5] have 2x value, [6][7] have 3x value but also other combinations. The more threads, the more likely are errors. With 16 threads, I always get the strange behavior. – CrypticDots Oct 18 '15 at 12:05
  • I just tested an older Mac with OS X 10.11 and homebrew gcc 5.2.0: unable to reproduce the error :/ – CrypticDots Oct 18 '15 at 12:27

1 Answers1

0

I strongly suspect that this is because floating-point addition is not associative. As a result, OpenMP sums the multiplications in different orders, yielding slightly different results.

The OpenMP 4.0 spec, section 1.3 Execution Model says:

For example, a serial addition reduction may have a different pattern of addition associations than a parallel reduction. These different associations may change the results of floating-point addition.

See OpenMP parallel for reduction delivers wrong results for a suggested solution.

mattst88
  • 1,462
  • 13
  • 21
  • My problem is that `asum` has exactly double or triple the value it should have. I don't think this is related to a summation precision problem. I also implemented the Kahan summation after reading your answer and still have that problem. But since is hasn't occurred on any other computer so far, I guess this is an unsolvable problem and related to my computer. – CrypticDots Oct 21 '15 at 09:20