Related: my answer on What formally guarantees that non-atomic variables can't see out-of-thin-air values and create a data race like atomic relaxed theoretically can? explains in more details that the formal rules of the C++ relaxed atomic memory model don't exclude "out of thin air" values. But they do exclude them in a note. This is a problem only for formal verification of programs using mo_relaxed
, not for real implementations. Even non-atomic variables are safe from this, if you avoid undefined behaviour (which you didn't in the code in this question).
You have data race Undefined Behaviour on x
and y
because they're non-atomic
variables, so the C++11 standard has absolutely nothing to say about what's allowed to happen.
It would be relevant to look at this for older language standards without a formal memory model where people did threading anyway using volatile
or plain int
and compiler + asm barriers, where behaviour could depend on compilers working the way you expect in a case like this. But fortunately the bad old days of "happens to work on current implementations" threading are behind us.
Barriers are not helpful here with nothing to create synchronization; as @davmac explains, nothing requires the barriers to "line up" in the global order of operations. Think of a barrier as an operation that makes the current thread wait for some or all of its previous operations to become globally visible; barriers don't directly interact with other threads.
Out-of-thin-air values is one thing that can happen as a result of that undefined behaviour; the compiler is allowed to do software value-prediction on non-atomic variables, and invent writes to objects that will definitely be written anyway. If there was a release-store, or a relaxed store + a barrier, the compiler might not be allowed to invent writes before it, because that could create
In general from a C++11 language-lawyer perspective, there's nothing you can do to make your program safe (other than a mutex or hand-rolled locking with atomics to prevent one thread from reading x
while the other is writing it.)
Relaxed atomics are sufficient to prevent the compiler from inventing writes without any other cost.
Except maybe defeating auto-vectorization and stuff, if you were counting on other uses of this variable being aggressively optimized.
atomic_int x = 0, y = 0
r1 = x.load(mo_relaxed) | r2 = y.load(mo_relaxed)
y.store(r1, mo_relaxed) | x.store(r2, mo_relaxed)
Value-prediction could speculatively get a future value for r2
into the pipeline before thread 2 sees that value from y
, but it can't actually become visible to other threads until the software or hardware knows for sure that the prediction was correct. (That would be inventing a write).
e.g. thread 2 is allowed to compile as
r2 = y.load(mo_relaxed);
if (r2 == 42) { // control dependency, not a data dependency
x.store(42, mo_relaxed);
} else {
x.store(r2, mo_relaxed);
}
But as I said, x = 42;
can't become visible to other threads until it's non-speculative (hardware or software speculation), so value prediction can't invent values that other threads can see. The C++11 standard guarantees that atomics
I don't know / can't think of any mechanism by which a store of 42
could actually be visible to other threads before the y.load
saw an actual 42. (i.e. LoadStore reordering of a load with a later dependent store). I don't think the C++ standard formally guarantees that, though. Maybe really aggressive inter-thread optimization if the compiler can prove that r2
will always be 42 in some cases, and remove even the control dependency?
An acquire-load or release-store would definitely be sufficient to block causality violations. This isn't quite mo_consume
, because r2
is used as a value, not a pointer.