The effect you're trying to create is not dependent out-of-order execution. That's only one of the things that can create memory reordering. Plus, modern x86 does out-of-order execution but uses its Memory Order Buffer to ensure that stores commit to L1d / become globally visible in program order. (Because x86's memory model only allows StoreLoad reordering, not StoreStore.)
Memory-reordering is separate from instruction execution reordering, because even in-order CPUs use a store buffer to avoid stalling on cache-miss stores.
Out-of-order instruction execution: is commit order preserved?
Are loads and stores the only instructions that gets reordered?
A C implementation on an in-order ARM CPU could print either 11 or 33, if x
and f
ended up in different cache lines.
I assume you compiled with optimization disabled, so your compiler effectively treats all your variables volatile
, i.e. volatile int x,f
. Otherwise the while(f==0);
loop will compile to if(f==0) { infloop; }
, only checking f
once. (Data race UB for non-atomic variables is what allows compilers to hoist loads out of loops, but volatile
loads have to always be done. https://electronics.stackexchange.com/questions/387181/mcu-programming-c-o2-optimization-breaks-while-loop#387478).
The stores in the resulting asm / machine code will appear in C source order.
You're compiling for x86, which has a strong memory model: x86 stores are release-stores, and x86 loads are acquire loads. You don't get sequential-consistency, but you get acq_rel for free. (And with un-optimized code, it happens even if you don't ask for it.)
Thus, when compiled without optimization for x86, your program is equivalent to
_Atomic int x, f;
int main(){
...
pthread_create
atomic_store_explicit(&x, 33, memory_order_release);
atomic_store_explicit(&f, 1, memory_order_release);
...
}
And similarly for the load side. The while(f==0){}
is an acquire-load on x86, so having the read side wait until it sees non-zero f
guarantees that it also sees x==33
.
But if you compiled for a weakly-ordered ISA like ARM or PowerPC, the asm-level memory-ordering guarantees there do allow StoreStore and LoadLoad reordering, so it would be possible for your program to print 11
if compiled without optimization.
See also https://preshing.com/20120930/weak-vs-strong-memory-models/