This code prints the value of x around 5000 but not 10000, why is that?

Question

This code that I have written creates 2 threads and a for loop that iterates 10000 times but the value of x at the end comes out near 5000 instead of 10000, why is that happening?

#include<unistd.h>
#include<stdio.h>
#include<sys/time.h>
#include "omp.h"
using namespace std;
int x=0;
int main(){
    omp_set_num_threads(2);
    #pragma omp parallel for
        for(int i= 0;i<10000;i++){
            x+=1;
        }
    printf("x is: %d\n",x);
}

Possible duplicate of [What is a race condition?](https://stackoverflow.com/questions/34510/what-is-a-race-condition) — Max Langhof, Dec 06 '18 at 13:41
`#pragma omp parallel for reduction(+:x)` is the way to go. It lifts the race condition on `x` and is faster than using atomic operations for all increments. — Brice, Dec 06 '18 at 15:07

score 4 · Answer 1 · answered Dec 06 '18 at 13:12

4

x is not an atomic type and is read and written in different threads. (Thinking that int is an atomic type is a common misconception.)

The behaviour of your program is therefore undefined.

Using std::atomic<int> x; is the fix.

answered Dec 06 '18 at 13:12

Bathsheba

231,907
34
361
483

iirc omp lets you declare variables as `private` such that each thread gets its own copy, though imho since c++11 using a `std::atomic` is the much cleaner solution – 463035818_is_not_an_ai Dec 06 '18 at 13:39
1

From a pure C++ point of view this may be the right answer, but from an OpenMP point of view, and for performance considerations, this is wrong. Please see https://stackoverflow.com/q/41309299/620382 and my answer here for details. – Zulan Dec 06 '18 at 18:53
@Zulan: Nice plug (and a nice answer that I’ve upvoted). – Bathsheba Dec 06 '18 at 19:23

score 1 · Answer 2 · answered Dec 06 '18 at 13:17

1

The reason is, that when multiple threads access the same variable, race conditions can occur. The operation x+=1 can be understand as: x = x + 1. So you first read the value of x and then write x + 1 to x. When you have two threads running and operating on the same value of x, following happens: Thread A reads the value of x which is 0. Thread B reads the value of x which is still 0. Then thread A writes 0+1 to x. And then Thread B writes 0+1 to x. And now you have missed one increment and x is just 1 instead of 2. A fix for this problem might be to use an atomic_int.

answered Dec 06 '18 at 13:17

Kilian

533
4
11

1

maybe offtopic, but you dont need reads **and** writes to have a race condition, it is sufficient to have two threads operating on `x` where at least one is writing, to have a race condition – 463035818_is_not_an_ai Dec 06 '18 at 13:34
1

btw I think this answer could be nice, but currently its formatting makes it a bit hard to follow – 463035818_is_not_an_ai Dec 06 '18 at 13:35

score 1 · Answer 3 · answered Dec 06 '18 at 18:52

Modifying one (shared) value by multiple threads is a race condition and leads to wrong results. If multiple threads work with one value, all of them must only read the value.

The idiomatic solution is to use a OpenMP reduction as follows

#pragma omp parallel for reduction(+:x)
for(int i= 0;i<10000;i++){
    x+=1;
}

Internally, each thread has it's own x and they are added together after the loop.

Using atomics is an alternative, but will perform significantly worse. Atomic operations are more costly in itself and also very bad for caches.

If you use atomics, you should use OpenMP atomics which are applied to the operation, not the variable. I.e.

#pragma omp parallel for
for (int i= 0;i<10000;i++){
    #pragma omp atomic
    x+=1;
}

You should not, as other answers suggest, use C++11 atomics. Using them is explicitly unspecified behavior in OpenMP. See this question for details.

This code prints the value of x around 5000 but not 10000, why is that?

3 Answers3