The concept of "later" is not one which the C++ standard defines or uses, so literally speaking, this is not a question that can be answered with reference to the standard.
If we instead use the concept of happens before as defined in [intro.races p10], then the answer to your question is yes. (I am using C++20 final draft N4860 here).
M
has a modification order [intro.races p4] which is a total order. By write-write coherence [intro.races p15], if #1 happens before #2, then the side effect storing the value 1
precedes the store of the value 2
in the modification order of M
.
Then by write-read coherence [intro.races p17], if #2 happens before #3, then #3 must take its value from #2, or from some side effect which follows #2 in the modification order of M
. But assuming that the program contains no other stores to M
, there are no other side effects that follow #2 in the modification order, so #3 must in fact take its value from #2, which is to say that #3 must load the value 2
. In particular #3 cannot take its value from #1, since #1 precedes #2 in the modification order of M
and not the other way around. (The modification order is defined as a total order, which means it cannot contain cycles, so it is not possible for #1 and #2 to both precede each other.)
Note that the acquire and release orderings you have attached to operations #1, #2, #3 are totally irrelevant in this analysis, and everything would be the same if they were relaxed
. Where memory ordering would matter is in the operations (not shown in your example) that enforce the assumption that #1 happens before #2 happens before #3.
For example, suppose that in Thread 1, operation #1 was sequenced before (i.e. precedes in program order) a store to some other atomic object Z
of a particular value (let's say true
). And likewise, that Thread 2 did not perform store #2 until after loading from Z
and observing that the value loaded was true
. Then the store to Z
in Thread 1 must be release (or seq_cst
), and the load of Z
in Thread 2 must be acquire (or seq_cst
). That way, you get the release store to Z
to synchronize with the acquire load [atomics.order p2], and by chasing through [intro.races p9-10], you conclude that #1 happens before #2. But again, no particular ordering would be needed on the accesses to M
.
In fact, if you have happens-before ordering, then everything works fine even if M
is not atomic at all, but is just an ordinary int
, and #1, #2, #3 are just non-atomic ordinary writes and reads to M
. Then the happens-before ensures that these accesses to M
do not cause a data race in the sense of [intro.races p21], so the behavior of the program is well-defined. We can then refer to [intro.races p13], the "not visibly reordered" rule. It is true that #2 happens before #3, and there is no other side effect X on M
such that #2 happens before X and X happens before #3. (In particular, this cannot be true of X = #1, because #1 happens before #2 and therefore not the other way around; by [intro.races p10] the "happens before" relation must not contain a cycle.) Thus, #3 must determine the value stored by #2, namely the value 2
.
Either way, though, the devil is in the details. If you actually write this in a program, the important part of the analysis would be to verify that, whatever your program does to ensure that #2 is "later than" #1, that it actually imposes a happens-before ordering. Naive or intuitive notions of "later" do not necessarily suffice. For instance, checking the system time around accesses #1, #2, #3 is not good enough. Nor is something like the above example where the extra flag Z
is only accessed with relaxed
ordering in one or both places, or worse, if it is not atomic at all.
If you cannot guarantee, by means of additional operations, that #1 happens before #2 happens before #3, then it is definitely not guaranteed that #3 loads the value 2
, not even if you soup up all of #1, #2, #3 to be seq_cst
.