Can release+acquire break happens-before?

Question

Many programming languages today have happens-before relation and release+acquire synchronization operations.

Some of these programming languages:

C/C++11: happens-before, release+acquire
Rust and Swift adopted the C/C++ memory model in its entirety — so they have that too.
Java: happens-before, release+acquire.

I would like to know if release+acquire can violate happens-before:

if it's possible, then I would like to see an example
if it's impossible, then I would like to get simple and clear explanations why

What is `release+acquire` and `happens-before`

Release/acquire establishes happens-before relation between different threads: in other words everything before release in Thread 1 is guaranteed to be visible in Thread 2 after acquire:

 \     Thread 1                        /            
  \    --------                       /             
   \   x = 1                         / Everything   
    \  y = 2                        /    here...    
     \ write-release(ready = true) /                
      └───────────────────────────┘                 
                   |                                
                   └─────────────┐ (happens-before) 
                                 V                  
                    ┌─────────────────────────┐     
                   / Thread 2                  \    
 ...is visible to /  --------                   \   
    everything   /   read-acquire(ready == true) \  
     here       /    assert(x == 1)               \ 
               /     assert(y == 2)                \

More than that, happens-before is a strict partial order.
This means it is:

transitive: Thread 2 is guaranteed to see not only writes made by Thread 1, but also all writes by other threads that Thread 1 saw
asymmetric: either a happens-before b, or b happens-before a, both isn't allowed

Why I think that `release/acquire` might break `happens-before`

As we know from IRIW litmus test, release/acquire could cause two threads to see writes from different threads in different order (for C++ see also the last example here, and these two examples from gcc wiki):

// Thread 1
x.store(1, memory_order_release);
// Thread 2
y.store(1, memory_order_release);
// Thread 3
assert(x.load(memory_order_acquire) == 1 && y.load(memory_order_acquire) == 0)
// Thread 4
assert(y.load(memory_order_acquire) == 1 && x.load(memory_order_acquire) == 0)

Here both asserts can pass, which means that Thread 3 and Thread 4 see writes to x and y in different order.

~~As I understand, if it were ordinary variables, then this would violate the asymmetry property of happens-before. But because x and y are atomics it's OK. (BTW I am not sure about that)~~
Nate Eldredge demonstrated in his answer that this IRIW example is OK.

But I still have a sneaking suspicion that there might be something similar to IRIW which would cause Thread 3 and Thread 4 to see regular writes to happen-before in different order — this would break happens-before (it wouldn't be transitive anymore).

Note1

In cppreference there is also this quote:

The implementation is required to ensure that the happens-before relation is acyclic, by introducing additional synchronization if necessary (it can only be necessary if a consume operation is involved, see Batty et al)

The quote hints that there might be cases when happens-before is violated and which require additional synchronization ("acyclic" means that happens-before forms a directed acyclic graph, which is equivalent to saying "strict partial order").

If it's possible I would like to know what are these cases.

Note2

Since java allows data races, I'm also interested in cases when happens-before is violated only in presence of data races.

Edit 1 (03 Nov 2021)

To give you an example, here is a explanation why sequentially consistent (SC) atomics cannot violate happens-before.
(A similar explanation for release/acquire atomics would be an answer to my question).

By "violate happens-before" I mean "to violate axioms of happens-before, which is a strict partial order".

Strict partial orders correspond directly to directed acyclic graphs (DAGs).

Here is an example of a DAG from wiki (note that it has no cycles):
DAG

Let's show that with SC atomics happens-before graph stays acyclic.

Remember that SC atomics happen in a global order (the same for all threads), and:

the order is consistent with the order of actions within each thread
every SC atomic read sees the latest SC atomic write in this total order to the same variable

Look at this happens-before graph:

 Thread1  Thread2  Thread3 
 =======  =======  ======= 
    │        │        │    
   W(x)      │        │    
    ↓        │        │    
   Sw(a) ┐   │       W(y)  
    │    │   │        ↓    
    │    └> Sr(a)  ┌ Sw(b) 
    │        ↓     │  │    
    │       Sr(b)<─┘  │    
    │        ↓        │    
    │       R(x)      │    
    │        ↓        │    
    │       R(y)      │    
    │        │        │    
    V        V        V

On the graph:

time flows downwards
W(x) and R(x) are regular actions: write and read of x
Sw(a) and Sr(a) are SC atomics: write and read of a
within each thread actions happen in a program order (aka sequenced-before order in C++): in the order they go in the code
between threads happens-before is established by SC atomics

Note that arrows on the graph always go downwards
=> the graph cannot have cycles
=> it's always a DAG
=> happens-before axioms cannot be violated

The same proof doesn't work for release/acquire atomics because (as far as I understand) they don't happen in a global order => a HB arrow between Sw(a) and Sr(a) might go upwards => a cycle is possible. (I'm not sure about that)

*"But because x and y are atomics it's OK."*. I don't think that's the reason. See the [cppreference](https://en.cppreference.com/w/cpp/atomic/memory_order): *The synchronization is established only between the threads releasing and acquiring the same atomic variable. Other threads can see different order of memory accesses than either or both of the synchronized threads.* There is no violation of happens-before, because it was not established between T3 and T4 in the first place (even transitively). — Aivean, Nov 01 '21 at 02:31
Also, I'm not sure that your question can be answered as it's currently phrased. You are asking for examples where "happens-before" is violated, but the whole point behind the memory model is to **guarantee** that this relation cannot not violated, if established. — Aivean, Nov 01 '21 at 02:38
@Aivean _There is no violation of happens-before, because it was not established between T3 and T4 in the first place (even transitively)_ There is no happens-before between T3 and T4, but there is happens-before (between T1+T2 and T3) and (between T1+T2 and T4) => this means T3 and T4 see writes in T1+T2 in different order => this breaks transitivity (isn't it?) — , Nov 01 '21 at 02:57
@Aivean _Also, I'm not sure that your question can be answered as it's currently phrased. ... the whole point behind the memory model is to guarantee that this relation cannot not violated, if established._ I don't understand **how** `release/acquire` guarantees that happens-before isn't violated => if it's guaranteed then I would like to get simple and clear explanations **how**. — , Nov 01 '21 at 03:09
One thing I'm not sure if you're aware is that the happens-before relation is *dynamic*. Given two operations A,B, the answer to which one happens before the other, or whether they are unordered, can in general vary from one run of the program to the next. Of course, for some programs you might be able to prove that one *always* happens before the other, but that is not an inherent feature of the ordering. — Nate Eldredge, Nov 01 '21 at 23:37
" there is happens-before (between T1+T2 and T3)": I do not know what that means. Happens-before is a partial ordering on *loads and stores*. "T3" isn't a load or a store, it's a thread. Perhaps you mean the loads that T3 does, but there are two of them. And I have no idea what "T1+T2" means at all; how do you add two threads? — Nate Eldredge, Nov 01 '21 at 23:51
@NateEldredge _"there is happens-before (between T1+T2 and T3)": I do not know what that means_ I meant that in order for the `assert(..)` in `Thread 3` to pass: (1) `x.store()` in `Thread 1` should _happen-before_ `x.load()` in `Thread 3`, and (2) `y.store()` in `Thread 2` should not _happen-before_ `y.load()` in `Thread 3` — , Nov 02 '21 at 00:44
@NateEldredge _"One thing I'm not sure if you're aware is that the happens-before relation is dynamic. ... can in general vary from one run of the program to the next."_ Yes, I am aware: `happens-before` is usually created from two other orders: a `program order` (orders actions within each thread, the order of operations as they are written in the program code) and `synchronizes-with` (reflects how synchronization actions from different threads are ordered in runtime, it creates `happens-before` bridges between threads, and it can be different in different runs of the program). — , Nov 02 '21 at 00:50
in java, at least, `IRIW` is possible because `release/acquire` does not offer `sequential consistency`, for example. At least, this is how I know it. — Eugene, Nov 03 '21 at 03:13
@Eugene in C/C++ `IRIW` is possible for the same reason: `release/acquire` atomics do not offer `sequential consistency`. — , Nov 03 '21 at 04:05
I get your question now, its just too complicated for me :( sorry — Eugene, Nov 03 '21 at 18:20

Nate Eldredge · Answer 1 · 2021-11-02T13:20:07.127

5

Happens-before is the transitive closure of sequenced-before and synchronizes-with. Sequenced-before is just program order within each thread, and synchronizes-with occurs when an acquire load takes its value from a release store. So in your program, in order to have both assertions pass, the following relations must hold:

T3.x==1 happens before T3.y==0 (sequencing)
T4.y==1 happens before T4.x==0 (likewise)
T1.x=1 happens before T3.x==1 (acquire load takes its value from release store)
T2.y=1 happens before T4.y==1 (likewise)
T1.x=1 happens before T3.y==0 (for transitivity)
T2.y=1 happens before T4.x==0 (likewise)

You may check that this satisfies all the axioms of a partial ordering (antisymmetric and transitive) as well as all the C++ coherence rules implied by both assertions passing. For instance, it must not be the case that T2.y=1 happens before T3.y==0, and indeed there is no such relation in our ordering. But it is also not true that T3.y==0 happens before T2.y=1, and there is nothing wrong with that; it is a partial order after all. T2.y=1 and T3.y==0 are simply unordered.

Since there is a valid happens-before ordering that is consistent with both assertions passing, it is possible that when you run the program, both assertions pass.

It is true that if some happens-before relation existed between T3.y==0 and T2.y=1 in either direction, and likewise between T4.x==0 and T1.x=1, then every combination would result in some violation of the rules: either a violation of coherence, or a cycle in the partial order. But again, it is perfectly fine for them to be unordered, and then no violation occurs.

If the loads and stores were all relaxed, or even if x and y were not atomic at all, then relations 3 and 4 above would not be implied by any rule, and so the happens-before ordering would become simply:

T3.x==1 happens before T3.y==0 (sequencing)
T4.y==1 happens before T4.x==0 (likewise)

This is also consistent with both assertions passing. (In the non-atomic case, the fact that T1.x=1 is unordered with the loads of x would mean there is a data race, and therefore the behavior is undefined in C++. In a different language like Java, we might have defined behavior saying that both loads succeed and return either 0 or 1, but we could still have both assertions pass.) If you think that changing the program to use non-atomic loads and stores would prevent both assertions from passing, you are mistaken.

So having acquire and release actually strengthens the ordering; there are more relations that must hold, and the behavior of the program becomes better defined.

edited Nov 02 '21 at 13:20

answered Nov 02 '21 at 03:07

Nate Eldredge

48,811
6
54
82

You are right: to remove the from-read edge from the happens before order would also ensure the happens before order is acyclic. This technique is also applied with other memory orders like TSO where the happens before order doesn't contain the edge between an earlier store of X and a later load of X on the same CPU. – pveentjer Nov 02 '21 at 09:19
Thanks for the explanations for the IRIW test. Could you explain general case: how the properties of `release/acquire` atomics given in the spec guarantee that the partial ordering axioms of `happens-before` cannot be violated? (For instance, for SC atomics I can see how it's guaranteed: there is a global order of all SC atomics => `synchronizes-with` edge always go from an earlier to a later atomic => all happens-before edges go from an earlier to a later action => there can be no cycles in such graph) – Nov 03 '21 at 00:02
1

@benmambt: It's the spec **itself** that guarantees the partial ordering axioms are not violated. That guarantee stands on its own, it's not a consequence of other requirements. It's an axiom, not a theorem. C++20 intro.races p10: "The implementation shall ensure that no program execution demonstrates a cycle in the 'happens before' relation." Full stop. It doesn't say "... unless release/acquire are used in such-and-such way." – Nate Eldredge Nov 03 '21 at 00:06
@benmambt: So if you come up with a program and can prove "if this program prints `0 1`, then the properties of release/acquire would force the happens-before relation to contain a cycle", then you have proved that the program can never print `0 1`. Hence the compiler, unless it is buggy, must not compile it in such a way that it could. – Nate Eldredge Nov 03 '21 at 00:08
_"It's the spec itself that guarantees the partial ordering axioms are not violated"_ In case of SC atomics I see how the spec implements that guarantee (that the HB axioms cannot be violated) : SC atomics are executed in global order => there cannot be cycles in a HB graph. So how are the HB guarantees implemented with `release/acquire` atomics then? – Nov 03 '21 at 00:24
_"So if you come up with a program and can prove "if this program prints 0 1, then the properties of release/acquire would force the happens-before relation to contain a cycle""_ This program would be exactly what I would like to get as an answer to my question. This is exactly what I want to find out: whether the properties of `release/acquire` are enough to guarantee happens-before axioms. So the program would be a proof. Of course, another possible answer would be a proof/explanation of how the properties of release/acquire guarantee happens-before axioms. – Nov 03 '21 at 00:51
Ironically, I found the paper ([Time, Clocks and the Ordering of Events in a Distributed System](https://www.microsoft.com/en-us/research/uploads/prod/2016/12/Time-Clocks-and-the-Ordering-of-Events-in-a-Distributed-System.pdf) by Lamport) where the term `happens-before` came from, and in this paper there are `release/acquire` synchronization actions which seem to guarantee no cycles in the HB graph ( in the paper `release` occurs at an earlier physical time than `acquire`). Unfortunately, HB in java and c++ seem to have no alternatives of the physical time property. – Nov 03 '21 at 01:17
1

I might get myself in a lot of trouble here, but this is sort of obvious if you read carefully the [JLS](https://docs.oracle.com/javase/specs/jls/se17/html/jls-17.html) spec. I am only going to refer to java, since I kind of know a little. JLS says that there is `PO` (program order), `SO` (synchronizes with order) and `HB` (happens before order). The JLS is very explicit in : "If an action x synchronizes-with a following action y, then we also have hb(x, y)." and also : "The source of a synchronizes-with edge is called a release, and the destination is called an acquire.". – Eugene Nov 03 '21 at 03:09
So _to me_, strictly speaking about java language, establishing a `SO` order, that trasitively closes `PO`, is establishing `HB`. With that in mind, I can't even understand the question. :| – Eugene Nov 03 '21 at 03:11
@Eugene the release/acquire that I'm talking about are implemented in java as [VarHandle.setRelease()](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/invoke/VarHandle.html#setRelease(java.lang.Object...)) and [VarHandle.getAcquire()](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/invoke/VarHandle.html#getAcquire(java.lang.Object...)). They are outside of [the JMM spec](https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html) — for example they definitely don't participate in `SO`. – Nov 03 '21 at 04:20
@Eugene _"I can't even understand the question."_ I can try to explain it to you this way: (1) you mentioned "trasitively closes" — so I suppose you know that `HB` is transitive (i.e. if (a `HB` b) and (b `HB` c) then it means (a `HB` c)) (2) I'm interested in how this transitivity is implemented: e.g. for `SC` atomics (aka `volatile` in java) I can see that transitivity is guaranteed by the properties of the atomics (I added this to my question as 'Edit 1') (3) but for `release/acquire` atomics I don't see how their properties guarantee that `HB` is transitive — and this is what I ask. – Nov 03 '21 at 07:36
2

@benmambt: So I suppose the question is: "Suppose we deleted the 'no cycles' requirement from the language standard. Is there a program which, while conforming to all remaining rules of the standard, would exhibit a cycle in the HB relation?" I think at this point you might want to ask it as a new question, since this question has already gone in so many different directions. You should probably also restrict it to a single language, since the answer may depend on the precise wording of the language standard. – Nate Eldredge Nov 03 '21 at 14:19
@benmambt: In fact, some of the C++ standard's requirements seem to be written assuming the partial order axioms. For instance, some simple counterexamples are ruled out by intro.races p14, that an evaluation B shall take its value from some side effect A such that B does not happen before A. It might have made more sense for them to say "such that A happens before B or such that A and B are unordered". They are equivalent assuming antisymmetry, but not in general, since with transitivity and cycles, A may happen both before and after B. – Nate Eldredge Nov 03 '21 at 18:26
@NateEldredge _"Suppose we deleted the 'no cycles' requirement from the language standard"_ I tried to find this requirement in C++ and found [this](https://timsong-cpp.github.io/cppwp/n4868/intro.races#10.2): _"The implementation shall ensure that no program execution demonstrates a cycle in the “happens before” relation. [Note 9: This cycle would otherwise be possible only through the use of consume operations. — end note]"_. As I understand, it means that `release/acquire` atomics cannot produce cycles in `HB` simply because of their properties (the compiler doesn't have to do anything). – Nov 03 '21 at 21:16
1

Oh, indeed. And actually, I might see how to prove it. The HB relation is transitive by its construction. The sequencing relation is acyclic, so if there is a cycle in HB, it must contain at least one synchronizes-with step, which is, roughly, an acquire load L that takes its value from a release store S. But because of the cycle, L happens before S. Therefore by p14, L must take its value from some store other than S, contradiction. Not quite a proof because I have oversimplified the definition of synchronizes-with from atomics.order p2, but it's a start. – Nate Eldredge Nov 03 '21 at 21:29
@NateEldredge I'm not really sure what is _"the 'no cycles' requirement"_ in C++ standard to the new question as you propose. – Nov 03 '21 at 21:58
@NateEldredge _"I might see how to prove it"_ that is just great. I will need some time to understand your explanation and to make sure that it doesn't miss some non-obvious cases. But usually simple explanations like yours work great for me. Maybe you could put it in a answer — I will make it an accepted answer after I figure out your explanations and make sure it's correct. – Nov 03 '21 at 22:14
@benmambt: By "the no-cycles requirement" I mean exactly the sentence that we discussed: "The implementation shall ensure that no program execution demonstrates a cycle in the 'happens before' relation." And what would happen if that sentence were to be deleted. – Nate Eldredge Nov 03 '21 at 22:41

score 2 · Answer 2 · answered Nov 15 '21 at 13:23

The following fragment of the accepted answer has an error:

 Thread1   Thread2     Thread3 
 =======   =======     ======= 
    │         │           │    
    │     ┌ Wrel(a)──┐    │    
    │     │   │      │    │    
  Racq(a)<┘   │      │    │    
    │         ↓      │    │    
    │     ┌ Wrel(b) ┐│    │    
    ↓     │   │     ││    │    
  Racq(b)<┘   │     └─> Racq(b)
    │         │      │    ↓    
    │         │      └> Racq(a)
    │         │           │    
    V         V           V

_{(BTW notice that this graph is different from SC: Thread 1 and Thread 3 see Wrel(a) and Wrel(b) in different orders. But edges point downwards nonetheless)}

Thread 1 and Thread 3 can see Wrel(a) and Wrel(b) in different orders only when Wrel(a) and Wrel(b) are from different threads.

This is so called IRIW (independent reads of independent writes) test.

So the fragment should be changed to this:

Thread1   Thread2   Thread3   Thread4
=======   =======   =======   =======
   │         │         │         │   
   │    ┌ Wrel(a)┐     │         │   
Racq(a)<┘    │   │┌ Wrel(b)┐     │   
   ↓         │   ││    │   └> Racq(b)
Racq(b)<─────│───│┘    │         ↓   
   │         │   └─────│────> Racq(a)
   │         │         │         │   
   V         V         V         V

_{(BTW notice that this graph is different from SC: Thread 1 and Thread 4 see Wrel(a) and Wrel(b) in different orders. But edges point downwards nonetheless)}

By the way, I could have made this a suggested edit for the accepted answer.

I didn't do it because it's a pretty specialized topic, so the reviewer might not know all the nuances and details, and reject the edit simply because it's not obvious that it really fixes an error.

But I think this is an error worth mentioning and fixing. So I put the fix in this answer, because:

people interested in this specific topic might see the fix
people might vote for this answer (and the fix) => if there will be enough votes, then this will indicate that the fix is correct => the fix will be incorporated into the accepted answer by moderators or people with enough reputation

Aivean · Answer 3 · 2021-11-01T04:59:32.763

0

I would like to know if release+acquire can violate happens-before.

Happens-before relationship cannot be "violated", as it is a guarantee. Meaning, if you established it in a correct way, it will be there, with all its implications (unless there is a bug in the compiler).

However, establishing just any happens-before relationship doesn't guarantee that you've avoided all possible race conditions. You need to establish carefully chosen relationships between relevant operations, that will eliminate all scenarios when data race is possible.

Let's review this code snippet:

// Thread 1
x.store(1, std::memory_order_release);

// Thread 2
y.store(1, std::memory_order_release);

// Thread 3
while (x.load(std::memory_order_acquire) != 1);
if (y.load(std::memory_order_acquire) == 0) {
  x_then_y = true;
}

// Thread 4
while (y.load(std::memory_order_acquire) != 1);
if (x.load(std::memory_order_acquire) == 0) {
  y_then_x = true;
}

As you see, T1 and T2 update x and y using memory_order_release memory ordering. T3 and T4 use memory_order_acquire.

Now, to quote cppreference:

Release-Acquire ordering If an atomic store in thread A is tagged memory_order_release and an atomic load in thread B from the same variable is tagged memory_order_acquire, all memory writes (non-atomic and relaxed atomic) that happened-before the atomic store from the point of view of thread A, become visible side-effects in thread B. That is, once the atomic load is completed, thread B is guaranteed to see everything thread A wrote to memory.

The synchronization is established only between the threads releasing and acquiring the same atomic variable. Other threads can see different order of memory accesses than either or both of the synchronized threads.

It explicitly says, that only relations between the store-load pairs in corresponding threads is established. No relation between load-load pairs is formed. Back to our example, the happens-before relations would form the following graph:

As you can see, there is no relation between T1 and T2, or T3 and T4, and no total order. T3 and T4 are free to see side-effects from T1 and T2 in any order.

That code snippet above was used in cppreference to illustrate the case, when memory_order_acquire + memory_order_release may be too weak to avoid data race (but that doesn't mean that this example violates happens-before or memory model, it's just that having happens-before may not be sufficient to avoid all possible data races!).

If memory_order_seq_cst was used for all operations in the example above, then the happens-before would've been additionally established between load operations in T3 and T4.

Meaning, that when first while(x.load) != 1 or while(y.load) != 1 passes, it guarantees that following x.load or y.load in another thread will have the result of 1, ensuring that x_then_y and y_then_x cannot be true in the same time.

Note, in java atomic and volatile operations always work similarly to memory_order_seq_cst for simplicity (sacrificing performance in some cases).

edited Nov 01 '21 at 04:59

answered Nov 01 '21 at 03:51

Aivean

10,692
25
39

I didn't say that the IRIW test violates happens-before violation. I just showed IRIW as something that is very close to violating happens-before — it seems to me that it might be possible to modify IRIW in a way that would allow `T3` and `T4` to see regular writes in different order (and this would violate happens-before). The problem is that I can neither find an example of how `release+acquire` violates `happens-before`, nor prove that it's impossible. This is exactly what I ask: either example how release+acquire` violates `happens-before`, or clear and simple proof that it's impossible. – Nov 01 '21 at 04:19
BTW I modified the code snipped to make it more simple and easier to read and understand. – Nov 01 '21 at 04:26
@benmambt It seems that you're under the misconception that happens-before can be "violated". It can't, it would be a bug in the memory model if it could. If you're seeing the unexpected data race or seemingly wrong behavior, that means that you are using the memory model wrong. – Aivean Nov 01 '21 at 04:36
@benmambt, your "fixed" code will produce the data race even with `memory_order_seq_cst`. The `while(load != 1)` loop was there for a reason. – Aivean Nov 01 '21 at 04:39
To say in other words: AFAIK `happens-before` is a strict partial order, which means it also forms [directed acyclic graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph)(DAG). I understand how `memory_order_seq_cst` prevents cycles in the DAG: all atomic `memory_order_seq_cst` operations are executed in total order (so there edges only goes forward). But I don't see how the same is guaranteed by `release+acquire` atomics: different threads can see these atomics in different order, so it seems like they could form cycles and violate `happens-before`. – Nov 01 '21 at 05:09
_It seems that you're under the misconception that happens-before can be "violated"._ **How** exactly is that guaranteed? For `memory_order_seq_cst` there are required HW properties and a proof in [this paper](https://rsim.cs.uiuc.edu/Pubs/ps2pdf/isca90.pdf) (which is the origin of many modern memory models). Is there something similar for `release+acquire`? Because I cannot find it yet — it's just assumed that `release+acquire` guarantees `happens-before` just like `memory_order_seq_cst`, but to me it's not obvious. – Nov 01 '21 at 05:20
_your "fixed" code will produce the data race even with memory_order_seq_cst. The while(load != 1) loop was there for a reason._ Are you sure? I took it from [the GCC wiki](https://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync#Acquire.2FRelease). I thought it's the same IRIW test, only simpler. – Nov 01 '21 at 05:25
yes, I'm positive. Events can happen in the following order, without violating any established relationships: `T4.x==0; T1.x=1; T3.x==1; T3.y==0; T2.y=1; T4.y==1` and both asserts will pass. – Aivean Nov 01 '21 at 05:33
You are right - it's a mistake. What if I replace in T4 (x==0, y==1) to (y==1, x==0), it seems to me that it will fix it. – Nov 01 '21 at 05:43
Also it's "both asserts can pass" instead of "both asserts can fail". And sorry for bothering you with these edits — I just want to make my question as easy for people to understand as possible (because IMO it's actually a pretty difficult question, so I would like to maximize chances to get an answer) – Nov 01 '21 at 05:55
No, I believe the example in the GCC wiki is wrong. Consider the following sequence of events for their example. `t4.y==0 y=20 t3.y==20 t3.x==0 x=10 t4.x==10` Both asserts will pass. So the statement *"if one assert passes, the other assert must therefore fail."* is wrong. The difference with the example from `cppreference` is that `while` loop ensured that atomic has certain value, before thread proceeds to check the next condition. – Aivean Nov 01 '21 at 05:58
Here, I even wrote [a test that verifies that](https://gist.github.com/Aivean/baf74b33f6acff562706d343c184f9db). Both asserts pass and it prints `20 0` and `0 10`. – Aivean Nov 01 '21 at 06:05
Yes, I agree that example in GCC wiki is wrong. It seems to me that changing reads' order in `T4` fixes it. In [your test](https://gist.github.com/Aivean/baf74b33f6acff562706d343c184f9db) that would mean replacing `int yy = y.load(...);...;int xx = x.load(...);` with `int xx = x.load(...);...;int yy = y.load(...);` in `func4()` – Nov 01 '21 at 06:14
I agree, that would be ensure that either one or both asserts fail (when `memory_order_seq_cst` is used). – Aivean Nov 01 '21 at 06:20
OK. I fixed that in the question text. – Nov 01 '21 at 06:34
@benmambt, regarding *"How exactly is that guaranteed? For memory_order_seq_cst there are required HW properties ..."*. Please check [this great answer](https://stackoverflow.com/a/6319356/1349366). Quote: *"the job of the programmer is to write code for the abstract machine; the job of the compiler is to actualize that code on a concrete machine. By coding rigidly to the spec, you can be certain that your code will compile and run without modification on any system with a compliant C++ compiler"*. If your question is about hardware implementation of the memory model, then I cannot help you. – Aivean Nov 01 '21 at 06:37
My question is: "It seems to me that `release/acquire` (as it's defined in the specification) might break `happens-before` (as it's defined it the specification), but I'm not sure. So I would like to see either an example of violation, or a proof (or clear explanation) why it's impossible". BTW in java `release+acquire` exit outside of the spec and also data races are not "undefined behavior", so it's an especially interesting case. – Nov 01 '21 at 07:00

score 0 · Accepted Answer · edited May 10 '23 at 17:01

Answer: no, release/acquire cannot break happens-before.

The proof is given by Nate Eldredge in this comment:

Oh, indeed. And actually, I might see how to prove it. The HB relation is transitive by its construction. The sequencing relation is acyclic, so if there is a cycle in HB, it must contain at least one synchronizes-with step, which is, roughly, an acquire load L that takes its value from a release store S. But because of the cycle, L happens before S. Therefore by p14, L must take its value from some store other than S, contradiction. Not quite a proof because I have oversimplified the definition of synchronizes-with from atomics.order p2, but it's a start.

I just would like to put it in a separate answer to make it easier for people to notice.

Plus here are additional explanations from me (maybe it will make it easier to understand for some people).

First of all, if we use only release/acquire atomics and non-atomic memory accesses, then happens-before is transitive (and acyclic) by construction.

Similarly to SC graph, in case of release/acquire edges also always point downwards:

Thread1   Thread2   Thread3   Thread4
=======   =======   =======   =======
   │         │         │         │   
   │    ┌ Wrel(a)┐     │         │   
Racq(a)<┘    │   │┌ Wrel(b)┐     │   
   ↓         │   ││    │   └> Racq(b)
Racq(b)<─────│───│┘    │         ↓   
   │         │   └─────│────> Racq(a)
   │         │         │         │   
   V         V         V         V

_{(BTW notice that this graph is different from SC: Thread 1 and Thread 4 see Wrel(a) and Wrel(b) in different orders. But edges point downwards nonetheless)}

Edges from Wrel(x) to Racq(x) always point downwards because Racq(x) sees Wrel(x) and everything before Wrel(x) as completed (See Notes in the end of the answer).
(In C++ spec this is called synchronizes-with, you can learn more about it here.)

As a result, similarly to SC graph, all edges always point downwards
=> the graph cannot have cycles
=> it's always a DAG
=> happens-before axioms cannot be violated

Actually happens-before produced by release/acquire atomics — is basically the original happens-before introduces by Leslie Lamport in Time, Clocks and the Ordering of Events in a Distributed System.
I really recommend to read the paper to everyone interested in HB — Lamport's explanations are short and clear, and the idea is really cool.

Let's also demonstrate with picture why cycles aren't possible.

This a what a cycle looks like:

 Thread1    Thread2    Thread3   
 =======    =======    =======   
    │          │          │      
  Racq(a)<─────│──────────│─────┐
    ↓          │          │     │
  Wrel(b) ┐    │          │     │
    │     │    │          │     │
    │     └> Racq(b)      │     │
    │          ↓          │     │
    │        Wrel(c) ┐    │     │
    │          │     │    │     │
    │          │     └> Racq(c) │
    │          │          ↓     │
    │          │        Wrel(a) ┘
    │          │          │      
    V          V          V

Within each thread happens-before is the order of actions in the source code (it' called sequenced-before in C++ and program order in Java).
Clearly, no HB cycles are possible here.

It means that the edge which "goes back" and closes the cycle must be a synchronizes-with edge like the Wrel(a)->Racq(a) edge in the graph above.

Notice a contradiction:

Wrel(a) must complete before Racq(a), because Racq(a) reads the value written by Wrel(a)
Racq(a) must complete before Wrel(a), because there is a chain Racq(a)->Wrel(b)->Racq(b)->Wrel(c)->Racq(c)->Wrel(a) where every Wrel(x) (and everything before it) completes before Racq(x) reads it.

This means the Wrel(a)->Racq(a) edge is not allowed => cycles aren't possible.

In terms of the C++ memory model it violates coherence requirements:

The value of an atomic object M, as determined by evaluation B, shall be the value stored by some side effect A that modifies M, where B does not happen before A.

Notes.

I stated that:

Racq(x) sees Wrel(x) and everything before Wrel(x) as completed

but in C++ standard that isn't stated directly.
Instead it has this:

happens-before is what defines what a read sees
relation between Racq(x) and Wrel(x) is called synchronizes-with
happens-before is built from synchronizes-with and huge amount of other rules and orders

It's possible to deduce my statement from the C++ standard, although it might not be easy. (That is why I recommend to read this article instead).

I used the statement because it concisely describes what memory barrier instructions do, and this is how happens-before can be (and probably is) easily implemented .
So often a memory barrier instruction is all we need to implement happens-before with all its mathematical properties.
For an overview of these instructions on different CPUs I would recommend the section on that in Memory Barriers: a Hardware View for Software Hackers by Paul E. McKenney.
(For example, memory barriers in PowerPC basically work the same as release/acquire atomics in C++)

Can release+acquire break happens-before?

What is release+acquire and happens-before

Why I think that release/acquire might break happens-before

Note1

Note2

Edit 1 (03 Nov 2021)

4 Answers4

What is `release+acquire` and `happens-before`

Why I think that `release/acquire` might break `happens-before`