How does mixing relaxed and acquire/release accesses on the same atomic variable affect synchronises-with?

Question

I have a question about the definition of the synchronises-with relation in the C++ memory model when relaxed and acquire/release accesses are mixed on one and the same atomic variable. Consider the following example consisting of a global initialiser and three threads:

int x = 0;
std::atomic<int> atm(0);

[thread T1]
x = 42;
atm.store(1, std::memory_order_release);

[thread T2]
if (atm.load(std::memory_order_relaxed) == 1)
    atm.store(2, std::memory_order_relaxed);

[thread T3]
int value = atm.load(std::memory_order_acquire);
assert(value != 1 || x == 42);  // Hopefully this is guaranteed to hold.
assert(value != 2 || x == 42);  // Does this assert hold necessarily??

My question is whether the second assert in T3 can fail under the C++ memory model. Note that the answer to this SO question suggests that the assert could not fail if T2 used load/acquire and store/release; please correct me if I got this wrong. However, as stated above, the answer seems to depend on how exactly the synchronises-with relation is defined in this case. I was confused by the text on cppreference, and I came up with the following two possible readings.

The second assert fails. The store to atm in T1 could be conceptually understood as storing 1_release where _release is annotation specifying how the value was stored; along the same lines, the store in T2 could be understood as storing 2_relaxed. Hence, if the load in T3 returns 2, the thread actually read 2_relaxed; thus, the load in T3 does not synchronise-with the store in T1 and there is no guarantee that T3 sees x == 42. However, if the load in T3 returns 1, then 1_release was read, and therefore the load in T3 synchronises-with the store in T1 and T3 is guaranteed to see x == 42.
The second assert success. If the load in T3 returns 2, then this load reads a side-effect of the relaxed store in T2; however, this store of T2 is present in the modification order of atm only if the modification order of atm contains a preceding store with a release semantics. Therefore, the load/acquire in T3 synchronises-with the store/release of T1 because the latter necessarily precedes the former in the modification order of atm.

At first glance, the answer to this SO question seems to suggest that my reading 1 is correct. However, that answer seems to be different in a subtle way: all stores in the answer are release, and the crux of the question is to see that load/acquire and store/release establishes synchronises-with between a pair of threads. In contrast, my question is about how exactly synchronises-with is defined when memory orders are heterogeneous.

I actually hope that reading 2 is correct since this would make reasoning about concurrency easier. Thread T2 does not read or write any memory other than atm; therefore, T2 itself has no synchronisation requirements and should therefore be able to use relaxed memory order. In contrast, T1 publishes x and T3 consumes it -- that is, these two threads communicate with each other so they should clearly use acquire/release semantics. In other words, if interpretation 1 turns out to be correct, then the code T2 cannot be written by thinking only about what T2 does; rather, the code of T2 needs to know that it should not "disturb" synchronisation between T1 and T3.

In any case, knowing what exactly is sanctioned by the standard in this case seems absolutely crucial to me.

score 4 · Accepted Answer · edited Mar 17 '22 at 14:01

4

Because you use relaxed ordering on a separate load & store in T2, the release sequence is broken and the second assert can trigger (although not on a TSO platform such as X86).
You can fix this by either using acq/rel ordering in thread T2 (as you suggested) or by modifying T2 to use an atomic read-modify-write operation (RMW), like this:

[Thread T2]
int ret;
do {
    int val = 1;
    ret = atm.compare_exchange_weak(val, 2, std::memory_order_relaxed);
} while (ret != 0);

The modification order of atm is 0-1-2 and T3 will pick up on either 1 or 2 and no assert can fail.

Another valid implementation of T2 is:

[thread T2]
if (atm.load(std::memory_order_relaxed) == 1)
{
    atm.exchange(2, std::memory_order_relaxed);
}

Here the RMW itself is unconditional and it must be accompanied by an if-statement & (relaxed) load to ensure that the modification order of atm is 0-1 or 0-1-2
Without the if-statement, the modification order could be 0-2 which can cause the assert to fail. (This works because we know there is only one other write in the whole rest of the program. Separate if() / exchange is of course not in general equivalent to compare_exchange_strong.)

In the C++ standard, the following quotes are related:

[intro.races]
A release sequence headed by a release operation A on an atomic object M is a maximal contiguous subsequence of side effects in the modification order of M, where the first operation is A, and every subsequent operation is an atomic read-modify-write operation.

[atomics.order]
An atomic operation A that performs a release operation on an atomic object M synchronizes with an atomic operation B that performs an acquire operation on M and takes its value from any side effect in the release sequence headed by A.

this question is about why an RMW works in a release sequence.

edited Mar 17 '22 at 14:01

Peter Cordes

328,167
45
605
847

answered Mar 17 '22 at 11:16

LWimsey

6,189
2
25
53

Thank you for a very quick and clear answer. So I see: relaxed compare_exhange_* is fine, but an ordinary store(x, relaxed) isn’t. I must admit that this is not really obvious, and reading the language of the standard isn’t easiest. In any case, this is really important to know! – Boris Mar 17 '22 at 11:29
Actually, could you please clarify one more thing: if T2 did a load(relaxed) but otherwise used relaxed compare_exchange_weak to modify the value, this wouldn’t interrupt the release sequence, would it? In other words, loads are irrelevant, and we only care about modifications, right? – Boris Mar 17 '22 at 11:49
And one more question: why do we then need compare_exhange_* with acquire/release? I mean, if compare_exhange_* never breaks the release sequence, we should always get the effects of acquire/release with just relaxed — or have I misunderstood something? (Of course, seq_cst would be a different matter.) – Boris Mar 17 '22 at 12:04
When you say forward: what if I didn’t use a while loop? Then I would just read it and perhaps do nothing with the value. The load itself isn’t breaking the sequence, right? – Boris Mar 17 '22 at 12:06
@Boris The RMW in the release sequence (compare_exchange_) does not require ordering (relaxed is fine). I asked a question about why it works this way (linked in the answer). – LWimsey Mar 17 '22 at 12:09
The while loop is necessary so that T2 forwards the correct value. T1: stores(rel) 1, T2: increments 1->2 (RMW, relaxed), T3: loads(acq) 2.. That is a correct release sequence. If T3 loads 1 (acq), the synchronization is between T1 & T3 and it does not matter what T2 does – LWimsey Mar 17 '22 at 12:13
Thank you. I still don’t quite understand the point of a while loop and the notion of “correct value”. Say `T2` did `if (atm.load(std::mo_relaxed) == 1) { int expected = 1; std::compare_exchange_weak(expected, 2, std::mo_relaxed); }` — that is, `T2` tries to CAS 2 when it sees 1, but gives up on failure. As far as I understand it, asserts in my example can’t fail. (1) If CAS succeeds, `T3` gets 2 and the release sequence stretching to `T1`, so `T3` synchronises-with `T1`. If CAS fails, `T3` gets 1, which was written by `T1`, and so it again synchronises-with `T1`. Am I wrong? – Boris Mar 17 '22 at 12:31
@Boris The while loop in T2 is still necessary (if you want T3 to synchronize on value 2) because `compare_exchange_weak` can 'spuriously' fail (no apparent reason). If the CAS fails, T3 will never observe 2 but still synchronize on 1 (T1->T3). If you use `if (load(relaxed)==1)` in T2 followed by a `compare_exchange_strong`, the while-loop isn't necessary. But don't replace it with a plain store or you are back at square one. – LWimsey Mar 17 '22 at 13:09
I used a while-loop in my answer to make it possible that T1 synchronizes with T3 via T2. Of course, if the while-loop is missing and `atm` is not updated in T2, T1 will still synchronize with T3 (loading value 1) and it is as-if T2 had not done anything. – LWimsey Mar 17 '22 at 13:10
OK. The point I was unsure about was exactly that: in case of failure for any reason, neither loading nor the failure don’t break the release sequence. So, T3 synchronises either with T1 or T2, and either way it observes `x`. I think that’s clear now. Thank you again for a detailed explanation and for your patience: I really appreciate it! – Boris Mar 17 '22 at 13:27
@Boris np, I will add another (valid) implementation for T2 that uses an unconditional RMW – LWimsey Mar 17 '22 at 13:34
1

@Boris: In addition, you *could* just use `compare_exchange_strong` with no loop and no `if`, as the most direct equivalent to a single `if() store`, working just like that but with the if/store tied together as a single atomic RMW. No spurious failure, only from actually seeing a `!=1` value with its load. I was a bit surprised this answer chose to change the behaviour of Thread 2 that way (to always eventually store) without mentioning the single CAS attempt way, and was going to comment on it myself before seeing it had already come up in comments. – Peter Cordes Mar 17 '22 at 14:10

How does mixing relaxed and acquire/release accesses on the same atomic variable affect synchronises-with?

1 Answers1

Linked