Multithreading and OOO execution.

Question

int main() {
    int  f = 0, x=0;

    std::thread *t  = new std::thread([&f, &x](){ while( f == 0); std::cout << x << endl;});
    std::thread *t2 = new std::thread([&f, &x](){ x = 42; f = 1;});

    t->join();
    t2->join();
    return 0;

}

From what I know, theoritically it is possible to get stdout equals to 0 against to our intuition ( we are expecting 42 as a result. But, CPU can execute out of order instructions and in fact, it is possbile to execute program in that order:

( We assume that we have > 1 cores in our CPU)

So, thread#2 on the second core executed first ( because of the OOO meachanism) f = 1 and then, thread#1 on the first core executed first program: while( f == 0); std::cout << x << endl. So, the output is 0.

I tried to get such output but I always get 42. I ran that program 1000000 times and the result was always same = 42.

(I know that it is not secure, there is data race).

My questions are:

Am I right or I am wrong? Why?
If I am right, is it possible to force to get output equals to 0?
How to make safe this code? I know about mutex/semaphores and I could protect f with mutex but I have heard something about memory fences, please say me more.

The use of atomic is actually vital here, your code does not even terminate on my system (gcc6.1 -O3) without. Also, why the pointer business? Just declare the threads as automatic variables. — Baum mit Augen, Jul 01 '16 at 10:19
@RichardCritten, Are you sure that atomic operations cannot be executed out of order? — Gilgamesz, Jul 01 '16 at 10:34
@Gilgamesz - atomic operations indeed cannot be performed out of order, provided the default memory model (and indeed the only memory model on the x86). Consecutive atomic operations form *sequenced-before* relationships to each other and also any instructions interspersed between them. — Smeeheey, Jul 01 '16 at 10:45
@Gilgamesz it depends you need to read the documentation see: http://en.cppreference.com/w/cpp/atomic/memory_order `memory_order_seq_cst` : "all threads observe all modifications (see below) in the same order." — Richard Critten, Jul 01 '16 at 11:36
@RichardCritten: You don't need `seq_cst` here. The best way is `f.store(1, std::memory_order_release)` to make sure the store to the flag becomes globally visible after previous stores in the source. But without inserting a full memory barrier to prevent StoreLoad reordering. `mo_release` is free on x86, but `mo_seq_cst` requires and `MFENCE`. — Peter Cordes, Jul 01 '16 at 15:37

score 4 · Accepted Answer · edited May 23 '17 at 12:09

But, CPU can execute out of order instructions and in fact, it is possbile to execute program in that order:

Out-of-order execution is different from reordering of when loads / stores become globally visible. OoOE preserves the illusion of your programming running in-order. Memory re-ordering is possible without OoOE. Even an in-order pipelined core will want to buffer its stores. See parts of this answer, for example.

If I am right, is it possible to force to get output equals to 0?

Not on x86, which only does StoreLoad reordering, not StoreStore reordering. If the compiler reorders the stores to x and f at compile time, then you will sometimes see x==0 after seeing f==1. Otherwise you will never see that.

A short sleep after spawning thread1 before spawning thread2 would also make sure thread1 was spinning on x before you modify it. Then you don't need thread2, and can actually do the stores from the main thread.

Have a look at Jeff Preshing's Memory Reordering Caught In The Act for a real program that does observe run-time memory reordering on x86, once per ~6k iterations on a Nehalem.

On a weakly-ordered architecture, you could maybe see StoreStore reordering at run-time with something like your test program. But you'd likely have to arrange for the variables to be in different cache lines! And you'd need to test in a loop, not just once per program invocation.

How to make safe this code? I know about mutex/semaphores and I could protect f with mutex but I have heard something about memory fences, please say me more.

Use C++11 std::atomic to get acquire/release semantics on your accesses to f.

std::atomic<uin32t_t> f;   // flag to indicate when x is ready
uint32_t x;

...

// don't use  new  when a local with automatic storage works fine
std::thread t1 = std::thread([&f, &x](){
    while( f.load(std::memory_order_acquire) == 0);
    std::cout << x << endl;});

// or sleep a few ms, and do t2's work in the main thread
std::thread t2 = std::thread([&f, &x](){
    x = 42; f.store(1, std::memory_order_release);});

The default memory ordering for something like f = 1 is mo_seq_cst, which requires an MFENCE on x86, or an equivalent expensive barrier on other architectures.

On x86, the weaker memory ordering just prevent compile-time reordering, but don't require any barrier instructions.

std::atomic also prevents the compiler from hoisting the load of f out of the while loop in thread1, like @Baum's comment describes. (Because atomic has semantics like volatile, where it's assumed that the stored value can change asynchronously. Since data races are undefined behaviour, the compiler normally can hoist loads out of loops, unless alias analysis fails to prove that stores through pointers inside the loop can't modify the value.).

Peter Cordes, thanks! I have to read something about alias analysis yet. — Gilgamesz, Jul 15 '16 at 22:30

Multithreading and OOO execution.

1 Answers1