Acquire/Release Visibility of Latest Operation

Question

There is a lot of subtlety in this topic and so much information to sift through. I couldn't find an existing question/answer that specifically addressed this question, so here goes.

If I have an atomic variable M of type std::atomic_int, where

Thread 1 performs M.store(1, memory_order_release)
Later, Thread 2 performs M.store(2, memory_order_release)
Even later, Thread 3 M.load(memory_order_acquire)

Is there any legitimate scenario in which Thread 3 could read the value 1 instead of 2?

My assumption is that it is impossible, because of write-write coherence and happens-before properties. But having spent an hour going over the C++ standard as well as cppreference, I still can't form a concise and definitive answer to this question.

I'd love to get an answer here with credible references. Thanks in advance.

Each separate atomic variable has its own modification order that all threads agree on. So no, you'll never see `1`, assuming that by "later" you mean "A happens before B" (in standardese). The memory orders of those three operations don't affect the result here. — HolyBlackCat, Dec 23 '22 at 10:07
You're correct, if your definition of "later" is that thread 2's store is the one that becomes the long-term value of `M` if no other stores happen. If you want to see interesting effects with orders weaker than seq_cst, you need multiple objects. See [Will two atomic writes to different locations in different threads always be seen in the same order by other threads?](https://stackoverflow.com/q/27807118) re: IRIW reordering. — Peter Cordes, Dec 23 '22 at 10:27
@HolyBlackCat: A better way to state a sensible definition of "later" is "later in the modification order of `M`". "A happens before B" I think implies that some other synchronization forced that to be true, e.g. because of an acquire load seeing a release store, not just that it happened to be true in this execution. So that would be too strong a definition, as you say it's still true just from the existence of a modification order for each object separately, and the coherence rules. — Peter Cordes, Dec 23 '22 at 10:32
The important thing to consider in such an analysis is *how do you know* that 2 happened "later" than 1 and so on. If you include whatever operations and observations that verify the "later", you will find it much easier to answer the question. Note that something like "well Thread 2 did a `sleep(500)`" won't prove it; the C++ memory model is entirely independent of timing, so you cannot rule out the possibility that Thread 1 just happened to be delayed for 501 seconds. — Nate Eldredge, Dec 23 '22 at 16:49
But it's good to note that you don't find words like "later" anywhere in the C++ standard. So if you are using those words informally, you have to decide which ordering in the memory model you actually mean ("happens before?" "coherence-ordered before"? etc.) And then ask yourself whether your code in context actually ensures that such an ordering holds. — Nate Eldredge, Dec 23 '22 at 16:52
@NateEldredge good and fair point. In my case knowing it's later comes from knowing the causation sequence of events that trigger either a store or a load. — Yam Marcovic, Dec 23 '22 at 21:10
Thanks for the elucidating comments here. Why not add an answer with some references? — Yam Marcovic, Dec 23 '22 at 21:11
@YamMarcovic: But unless those events actually perform synchronization and are free of data races, "causation" is not good enough. You really have to be precise and look at the specifics. — Nate Eldredge, Dec 23 '22 at 21:14
@YamMarcovic: For example, maybe Thread 1 does `M.store(1, memory_order_release); done.store(true, std::memory_order_relaxed);` and Thread 2 does `if (done.load(std::memory_order_relaxed) == true) M.store(2, std::memory_order_release);`. You might think that "causation" means that the store of 2 to `M` happened "later" than the store of 1, but for purposes of memory ordering, it doesn't; there's no synchronization. And so Thread 3, even if it is properly synchronized to happen after both stores, may very well read the value 1. — Nate Eldredge, Dec 23 '22 at 21:25
You would need to upgrade the accesses to `done` to be release/acquire. — Nate Eldredge, Dec 23 '22 at 21:25
I can write an answer for your bounty, but it will start out "Your question cannot be answered without a more precise definition of what you mean by 'later'". Is that what you are looking for? — Nate Eldredge, Dec 28 '22 at 00:14
@NateEldredge by Later I mean that T1 store-releases, notifies a condition variable for T2, then T2 store-releases and notifies a CV for T3, and then T3 load-acquires. Does that help? — Yam Marcovic, Dec 28 '22 at 06:48
@YamMarcovic: Partially. A condition variable implies a mutex. In that case you get synchronization when T1 unlocks the mutex (a release operation) and T2 locks it (acquire). But before that, you have T1 and T2 racing to lock the mutex, and if T1 wins then T2 will miss the notify and wait forever. So you need more code to avoid that. Again, it's the details of that code that will lead to a true or false answer to the question. — Nate Eldredge, Dec 28 '22 at 07:12
NateEldredge not sure I understand. Even if we left out the CVs, then isn't the fact that there is acquire/release pairing in my code example, and adding the chronological constraint (i.e. maybe I pushed a button for each line to run sequentially) sufficient to reach a conclusion with regards to my particular code example? — Yam Marcovic, Dec 29 '22 at 07:24
Assume these lines actually happened, and are not conditioned upon any relaxed operations as in your example. — Yam Marcovic, Dec 29 '22 at 07:26
Are you saying then that we should interpret "A is later than B" as "B *happens-before* A"? Then it is certainly answerable. But my basic issue is that you are insisting on an answer citing chapter and verse from the C++ standard, yet your question is not posed in terms of concepts that the standard actually defines. So in order to put the answer into practice, it will be your responsibility to analyze whatever mechanism your program uses to ensure that one operation is "later", and ensure that it actually enforces a happens-before ordering. — Nate Eldredge, Dec 29 '22 at 18:10
Also, under that assumption, the memory orderings of the three operations on `M` are totally irrelevant, and everything would be the same if they were all `relaxed` - or even if `M` were not atomic at all. What is important is that the unspecified operations in between have sufficiently strong ordering to enforce happens-before. So that's why I'm concerned that this may miss the point you are trying to understand. — Nate Eldredge, Dec 29 '22 at 18:17
So to your earlier comment: No. By themselves, the acquire-release orderings you attached to your accesses on `M` do not provide any kind of guarantee that #3 will load the value 2. That guarantee comes only if the operations *in between* actually impose a happens-before. Your chronological or button-pushing operations don't do so, unless that is some feature of your specific implementation. — Nate Eldredge, Dec 29 '22 at 20:10

Nate Eldredge · Accepted Answer · 2022-12-29T20:15:14.863

The concept of "later" is not one which the C++ standard defines or uses, so literally speaking, this is not a question that can be answered with reference to the standard.

If we instead use the concept of happens before as defined in [intro.races p10], then the answer to your question is yes. (I am using C++20 final draft N4860 here).

M has a modification order [intro.races p4] which is a total order. By write-write coherence [intro.races p15], if #1 happens before #2, then the side effect storing the value 1 precedes the store of the value 2 in the modification order of M.

Then by write-read coherence [intro.races p17], if #2 happens before #3, then #3 must take its value from #2, or from some side effect which follows #2 in the modification order of M. But assuming that the program contains no other stores to M, there are no other side effects that follow #2 in the modification order, so #3 must in fact take its value from #2, which is to say that #3 must load the value 2. In particular #3 cannot take its value from #1, since #1 precedes #2 in the modification order of M and not the other way around. (The modification order is defined as a total order, which means it cannot contain cycles, so it is not possible for #1 and #2 to both precede each other.)

Note that the acquire and release orderings you have attached to operations #1, #2, #3 are totally irrelevant in this analysis, and everything would be the same if they were relaxed. Where memory ordering would matter is in the operations (not shown in your example) that enforce the assumption that #1 happens before #2 happens before #3.

For example, suppose that in Thread 1, operation #1 was sequenced before (i.e. precedes in program order) a store to some other atomic object Z of a particular value (let's say true). And likewise, that Thread 2 did not perform store #2 until after loading from Z and observing that the value loaded was true. Then the store to Z in Thread 1 must be release (or seq_cst), and the load of Z in Thread 2 must be acquire (or seq_cst). That way, you get the release store to Z to synchronize with the acquire load [atomics.order p2], and by chasing through [intro.races p9-10], you conclude that #1 happens before #2. But again, no particular ordering would be needed on the accesses to M.

In fact, if you have happens-before ordering, then everything works fine even if M is not atomic at all, but is just an ordinary int, and #1, #2, #3 are just non-atomic ordinary writes and reads to M. Then the happens-before ensures that these accesses to M do not cause a data race in the sense of [intro.races p21], so the behavior of the program is well-defined. We can then refer to [intro.races p13], the "not visibly reordered" rule. It is true that #2 happens before #3, and there is no other side effect X on M such that #2 happens before X and X happens before #3. (In particular, this cannot be true of X = #1, because #1 happens before #2 and therefore not the other way around; by [intro.races p10] the "happens before" relation must not contain a cycle.) Thus, #3 must determine the value stored by #2, namely the value 2.

Either way, though, the devil is in the details. If you actually write this in a program, the important part of the analysis would be to verify that, whatever your program does to ensure that #2 is "later than" #1, that it actually imposes a happens-before ordering. Naive or intuitive notions of "later" do not necessarily suffice. For instance, checking the system time around accesses #1, #2, #3 is not good enough. Nor is something like the above example where the extra flag Z is only accessed with relaxed ordering in one or both places, or worse, if it is not atomic at all.

If you cannot guarantee, by means of additional operations, that #1 happens before #2 happens before #3, then it is definitely not guaranteed that #3 loads the value 2, not even if you soup up all of #1, #2, #3 to be seq_cst.

Thanks. I should've specified a bit more. In my specific case, the ctor of an array of atomic ints store-releases zeros in all of them. Then there is a set_value_at kind of function that also store-releases. The specific Q here was if T1 constructs, T2 then sets a value specifically, and T3 load-acquires. The internal logic of the container class takes care of ordering which is hidden from the user of its API. What must be ensured is that once T2 sets a value with store-release, then T3 should not read a zero by load-acquiring. — Yam Marcovic, Dec 30 '22 at 21:06
@YamMarcovic: I don't think I can say anything further without seeing actual code for the "internal logic". — Nate Eldredge, Dec 31 '22 at 01:12
@YamMarcovic: I will just emphasize one more time: **the acquire/release orderings on the accesses to `M` themselves are irrelevant**. If your "internal logic" truly enforces a happens-before ordering, then the acquire/release semantics on the accesses to `M` are unnecessary. If it doesn't, then the acquire/release semantics don't help. — Nate Eldredge, Dec 31 '22 at 01:52
could you please give a brief code example for both cases? I'm not at liberty to share the exact code, and I'm still not fully sure I follow. — Yam Marcovic, Jan 01 '23 at 02:58
@YamMarcovic Simply speaking "write-write coherence" doesn't work on atomic by itself. For it to work T1 write should *happen before* T2 write, and you can insure this only by some external synchronization, possible other atomic variable. — sklott, Jan 03 '23 at 11:19
@NateEldredge would an atomic fence with release/acquire, after the store to M, suffice to ensure the happens before ordering on M? That is, replacing Z in your example with a fence. — Yam Marcovic, Jan 03 '23 at 21:39

sklott · Answer 2 · 2022-12-28T18:37:29.567

As almost everyone noticed in comments there is no any guarantees if "later" is some external element to memory model, like passed time or something similar.

But if "later" is actually inverted "happens before", then this question basically doesn't have any sense, since "happens before" (for this case) boils down to "synchronizes-with". And "synchronizes-with" defined as:

If an atomic store in thread A is a release operation, an atomic load in thread B from the same variable is an acquire operation, and the load in thread B reads a value written by the store in thread A, then the store in thread A synchronizes-with the load in thread B.

In other words, for 2 to happen before 3 we should have following code in thread 3 (or its equivalent with other atomic):

  while(M.load(memory_order_acquire) != 2);
  // At this point 2 happens before 3

Then of course in absence of any other stores to this atomic it will have value 2. But, there is no gurantees that thread 3 will not read value 1 before value 2 in while since at that moment it is not yet "synchronized-with" thread 2.

And as you already suggested yourself if 1 happens before 2, then due to "Write-write coherence" we never will see value 1 after we get value 2.

Acquire/Release Visibility of Latest Operation

2 Answers2