Preventing of Out of Thin Air values with a memory barrier in C++

Question

Let's consider the following two-thread concurrent program in C++:

x,y are globals, r1,r2 are thread-local, store and load to int is atomic. Memory model = C++11

int x = 0, int y = 0
r1 = x   | r2 = y 
y = r1   | x = r2

A compiler is allowed to compile it as:

int x = 0, int y = 0
r1 = x   | r2 = 42 
y = r1   | x = r2
         | if(y != 42) 
         |    x = r2 = y

And, while it is intra-thread consistent, it can result in wild results, because it is possible that execution of that program results in (x, y) = (42, 42)

It is called Out of Thin Air values problem. And it exists and we have to live with that.

My question is: Does a memory barrier prevent a compiler from doing wild optimizations that result in out-of-thin-air values?

For example:

[fence] = atomic_thread_fence(memory_order_seq_cst);

int x = 0, int y = 0
r1 = x   | r2 = y 
[fence]  | [fence]
y = r1   | x = r2

Peter Cordes · Accepted Answer · 2020-10-04T02:20:57.160

3

Related: my answer on What formally guarantees that non-atomic variables can't see out-of-thin-air values and create a data race like atomic relaxed theoretically can? explains in more details that the formal rules of the C++ relaxed atomic memory model don't exclude "out of thin air" values. But they do exclude them in a note. This is a problem only for formal verification of programs using mo_relaxed, not for real implementations. Even non-atomic variables are safe from this, if you avoid undefined behaviour (which you didn't in the code in this question).

You have data race Undefined Behaviour on x and y because they're non-atomic variables, so the C++11 standard has absolutely nothing to say about what's allowed to happen.

It would be relevant to look at this for older language standards without a formal memory model where people did threading anyway using volatile or plain int and compiler + asm barriers, where behaviour could depend on compilers working the way you expect in a case like this. But fortunately the bad old days of "happens to work on current implementations" threading are behind us.

Barriers are not helpful here with nothing to create synchronization; as @davmac explains, nothing requires the barriers to "line up" in the global order of operations. Think of a barrier as an operation that makes the current thread wait for some or all of its previous operations to become globally visible; barriers don't directly interact with other threads.

Out-of-thin-air values is one thing that can happen as a result of that undefined behaviour; the compiler is allowed to do software value-prediction on non-atomic variables, and invent writes to objects that will definitely be written anyway. If there was a release-store, or a relaxed store + a barrier, the compiler might not be allowed to invent writes before it, because that could create

In general from a C++11 language-lawyer perspective, there's nothing you can do to make your program safe (other than a mutex or hand-rolled locking with atomics to prevent one thread from reading x while the other is writing it.)

Relaxed atomics are sufficient to prevent the compiler from inventing writes without any other cost.

Except maybe defeating auto-vectorization and stuff, if you were counting on other uses of this variable being aggressively optimized.

atomic_int x = 0, y = 0
r1 = x.load(mo_relaxed)    | r2 = y.load(mo_relaxed)
 y.store(r1, mo_relaxed)   | x.store(r2, mo_relaxed)

Value-prediction could speculatively get a future value for r2 into the pipeline before thread 2 sees that value from y, but it can't actually become visible to other threads until the software or hardware knows for sure that the prediction was correct. (That would be inventing a write).

e.g. thread 2 is allowed to compile as

r2 = y.load(mo_relaxed);
if (r2 == 42) {                   // control dependency, not a data dependency
    x.store(42, mo_relaxed);
} else {
    x.store(r2, mo_relaxed);
}

But as I said, x = 42; can't become visible to other threads until it's non-speculative (hardware or software speculation), so value prediction can't invent values that other threads can see. The C++11 standard guarantees that atomics

I don't know / can't think of any mechanism by which a store of 42 could actually be visible to other threads before the y.load saw an actual 42. (i.e. LoadStore reordering of a load with a later dependent store). I don't think the C++ standard formally guarantees that, though. Maybe really aggressive inter-thread optimization if the compiler can prove that r2 will always be 42 in some cases, and remove even the control dependency?

An acquire-load or release-store would definitely be sufficient to block causality violations. This isn't quite mo_consume, because r2 is used as a value, not a pointer.

edited Oct 04 '20 at 02:20

answered Jul 08 '18 at 21:38

Peter Cordes

328,167
45
605
847

1

One could assume that what the OP shows is pseudo-code and the reads and writes to `x` and `y` are relaxed atomic reads (they do note that load and store "to `int`" are atomic, but yeah that's not really clear). Usually when these issues are discussed they involve concurrent access to relaxed atomic vars. This "thin air" behavior is currently allowed by the C++11 standard, and is example 2 [here](https://www.cl.cam.ac.uk/~pes20/cpp/notes42.html). Maybe they fixed it in later revisions, I'm not sure. – BeeOnRope Jul 09 '18 at 03:18
@BeeOnRope: Thanks for the link. But proposed optimization in the question, claiming that the compiler can introduce an unconditional `x = 42` (and then re-read `y`? looks confused...), is only possible with non-`atomic` variables. The compiler can allow reordering and break dependencies, but it can't invent writes outside of conditional branches. Thus parts of the question only make sense if it's talking about real `int`, as well as the assertion that the thin-air problem is real and we have to live with it. It's *not* real (for this case) on any actual implementation, esp. not `[x86]`. – Peter Cordes Jul 09 '18 at 03:48
@BeeOnRope: Hmm, [the wording in ISO C++ 29.4 subsection 3](http://eel.is/c++draft/atomics.order#3) only applies to `seq_cst`, requiring that an atomic load sees some value that was actually in the object at some point. But [subsection 9](http://eel.is/c++draft/atomics.order#9) only says implementations *should* avoid out-of-thin-air values, without making it a hard requirement. That of course prevents you non-speculatively seeing `42` if the object never contained `42` during the life of the program. – Peter Cordes Jul 09 '18 at 03:54
See my link - the example the OP is uses is identical to example 2 which is the classic "out of thin air" problem. Consider the OPs subsequent code with the conditional an example of how it can happen, for example, as a result of a speculative hardware mechanism. The _should_ part of subsection 9 is really just a cop-out - as the rest of the link shows it is hard to actually even formalize how to avoid this situation as absurd as the "reduced" examples might seem. – BeeOnRope Jul 09 '18 at 04:00
@BeeOnRope: That paper seems to miss the fact that it can still only happen as a result of mis-speculation which must be rolled back. If you made those speculative stores visible to other threads, then the other threads have to be rolled back, too. They say "*We don't expect future hardware to adopt the load-value prediction that would be required to make it observable.*" but such value-prediction would have to be perfect before you could allow it to be non-speculative, and then it couldn't predict an out-of-thin-air value. (Or more likely, checked before retirement like my answer says.) – Peter Cordes Jul 09 '18 at 04:06
1

Consider one mechanism by which this could happen: value speculation combined with CPUs that see each others speculative store buffer state (e.g., SMT siblings on a design with such a "feature"). Each CPU could speculate a read of 42 (or any value) and then do the write and then each could confirm their speculation against the store buffer of the other CPU, and the read could appear out of thin air. Such an architecture with cross-logical-core snooping of speculative state seems unlikely, however! – BeeOnRope Jul 09 '18 at 04:06
_but such value-prediction would have to be perfect before you could allow it to be non-speculative_ - by "perfect" I guess you mean it has to be correct, right? Yes, definitely, but as above, the prediction can check out as correct in the future based on the action of the other thread. When they say "[w]e don't expect future hardware to adopt the load-value prediction that would be required to make it observable" - I take it as saying that they don't expect the reading speculative state part to become reality, since dismissing "value prediction" entirely seems quite unlikely. – BeeOnRope Jul 09 '18 at 04:10
@BeeOnRope: oh right, threads seeing speculative stores from other threads would allow them to each confirm their speculation if reordering is also allowed. Yup, that's the idea I was missing. – Peter Cordes Jul 09 '18 at 04:11
... since value prediction is an actual thing in uarch research and has been (maybe) "just beyond the horizon" for real hardware for some time. – BeeOnRope Jul 09 '18 at 04:11
Obviously, I meant std::atomic. I have should mention it. I just was wondering whether a barrier prevents compiler before doing wild optmizations (such that results in OOAT values), and I don't know still (I know that barrier is not a magical instruction that synchronizes threads in magical way. I just suppose that if compiler sees a barrier then it refrain itself before doing predictions- this is why I put a barrier seemingly stupid). – Gilgamesz Jul 09 '18 at 17:06
Obviously, thanks you a lot (@BeeOnRope and @PeterCordes) for a such informative discussion between great folks. I proud that I raised it ;) ! – Gilgamesz Jul 09 '18 at 17:07
@Gil - FWIW I _think_ but am not sure, that the answer to your original question (assuming relaxed atomic ops) is that yes, the seq_cst fence prevents the possibility of out-of-thin-air reads, agreeing with the last part of Peter's answer. – BeeOnRope Jul 09 '18 at 17:11
However, I would like to understand it as well ;-). As I understand, C++ standard allows load speculation to be visible for observers but it requires possibility of rollback misspeculated value on **all** threads (so I if thread 1 observed misspeculatively published value load by the thread 2, there must exists a mechanism that facilitate to rollback it both threads. But, as @BeeOnRope points that mechanism can "fail", yes? – Gilgamesz Jul 09 '18 at 17:13
Well the C++ standard doesn't talk in terms of load speculation or anything like that. It has a more, abstract formal model, and it is this model that "admits" the various out-of-thin-air issues like your example, simply because the language doesn't _exclude_ them. That is, you won't find language that says that speculative loads are allowed, you'll just find (more or less) a list of restrictions that a candidate execution must follow, and anything that does that is allowed. The restrictions are loose enough that the thin air execution is allowed, because it is not _disallowed_. – BeeOnRope Jul 09 '18 at 17:45
So anything that results in an allowed execution is fair game, but then the question becomes is there a reasonable hardware or compiler that would cause this result, and also does the result make programs hard to reason about. If the answer to both questions is simultaneously yes for both questions regarding a single behavior: you have a tough trade-off to make. In the most well-known examples, however, the answers are generally understood to be "no" and "yes" respectively, indicating that ideally the behavior is disallowed - yet the standard allows them because it is _hard_ to specify... @gil – BeeOnRope Jul 09 '18 at 17:51

davmac · Answer 2 · 2018-07-08T20:44:55.417

0

Not by itself. In your example, there is nothing synchronising the two threads. In particular, the fence in both thread do not cause the threads to synchronise at that point; For example, you might get the following sequence:

  (Thread #1)       |   (Thread #2)
r1 = x              |
[fence]             |
y = junk temporary  |
                    | r2 = y    // junk!
                    | [fence]
                    | x = r2
y = r1              |

The simplest way to avoid out-of-thin-air results is to use atomic integers: if x and y atomic then they cannot have "out of thin air" values:

std::atomic_int x = 0, y = 0;
int r1 = x;    |    int r2 = y;
y = r1;        |    x = r2;

edited Jul 08 '18 at 20:44

answered Jul 08 '18 at 20:31

davmac

20,150
1
40
68

1

You don't need to use `seq_cst` atomic load/store, do you? That's far more expensive than you need if you just want to prevent out-of-thin-air values. I think release stores or acquire loads would be sufficient to prevent LoadStore reordering. (You really only need to defeat value-prediction, so any normal implementation is probably safe with relaxed, and definitely with `mo_consume`, but that gets strengthened to `mo_acquire` by current implementations.) – Peter Cordes Jul 08 '18 at 21:06
Err actually `relaxed` is enough to prevent out-of-thin-air values because the compiler can't invent writes to an atomic. I was thinking about storing the correct value of `y` before actually seeing it, violating causality. – Peter Cordes Jul 08 '18 at 21:12
@PeterCordes no, you don't need `seq_cst` atomic load/store, and yes I think `relaxed` would be enough. But for keeping the example simple... – davmac Jul 08 '18 at 21:37
1

seq_cst gives you so much (and costs so much) that it almost makes the answer un-interesting. Like, of course that's safe. I added an answer with mo_relaxed. – Peter Cordes Jul 08 '18 at 21:43
2

In the C++11 memory, you can, in theory, get out-of-thin-air values as the OP describes even with atomic values (relaxed reads), this is a known flaw in the model. In practice, compilers are unlikely to generate code that exploits this, and hardware doesn't generally cause it either. Still, it's a flaw they were actively trying to fix after C++11. [A good background read](https://www.cl.cam.ac.uk/~pes20/cpp/notes42.html). – BeeOnRope Jul 09 '18 at 03:21

Preventing of Out of Thin Air values with a memory barrier in C++

2 Answers2

Relaxed atomics are sufficient to prevent the compiler from inventing writes without any other cost.

Linked