How does global ordering and per atomic variable ordering differ?

Question

I am trying to understand the difference between memory_order_seq_cst and memory_order_acq_rel. A few posts on SO already covered this question, however, I don't understand their answer.

This post said

memory_order_acq_rel provides read and write orderings relative to the atomic variable, while memory_order_seq_cst provides read and write ordering globally.

It also includes an example

bool x= false;
bool y= false;
bool z= 0;

a() { x= true; }
b() { y= true; }
c() { while (!x); if (y) z++; }
d() { while (!y); if (x) z++; }

// kick off a, b, c, d, join all threads
assert(z!=0);

The snippet is a skeleton version of an example on cppreference.com. I compiled the example on Ideone and CoLiRu. On Ideone, z could be 1 or 2 under both memory_order_acq_rel and memory_order_seq_cst. On CoLiRu, I could only get 2. It's never 0.

My questions are:

Don't while(!x) and while(!y) guarantees that either if(y) or if(x) returns true, even we use per atomic variable ordering?
How does this example explain the difference between memory_order_acq_rel and memory_order_seq_cst?
Can anyone provide an example using memory fence to illustrate the difference between per atomic variable ordering and global ordering?

I doubt there is any common hardware where you can notice the difference between the two. — SergeyA, Jan 05 '16 at 19:36
The inability for actual machines to produce results permitted by some of the memory orderings is not surprising. The full set of memory orderings where designed to permit existing, yet less common architectures to be interacted with optimally. On top of that, I believe they are only important when you are doing low-level performance tuning on those architectures. As for #1, you test `x` then `y`: if x and y writes are unordered relative to each other, you can see something happening *after* something that happened *before*. Imagine per-page locks and consistency... — Yakk - Adam Nevraumont, Jan 05 '16 at 20:22
Where did you even put acq_rel in that example? The only RMW is z++, which is irrelevant. — Cubbi, Jan 05 '16 at 20:35
@Cubbi The example has been stripped down to save space. the full implementation can be found in cppreference.com. I replaced all occurance of seq_cst by acq_rel in that example. — Candy Chiu, Jan 06 '16 at 14:17
@candy chiu that would be UB, see 29.6.5p9 for what memory orders are allowed with store, p13 for load (cppreference mentions it too) — Cubbi, Jan 06 '16 at 14:36
@Cubbi I have read those, but don't understand it. Can you provide an example when z is 0? — Candy Chiu, Jan 06 '16 at 14:56
store should be release, load should be acquire. You need a non-intel CPU. — Cubbi, Jan 06 '16 at 15:03
although I don't understand the difference neither, according to this [video](https://www.youtube.com/watch?v=ZQFzMfHIxng), on x86, there is no difference between `memory_order_acq_rel` and `memory_order_seq_cst`. And by default, `std::atomic`'s operation is using `memory_order_seq_cst`. So `++z` is using `memory_order_seq_cst`. And even if it is changed to `memory_order_acq_rel`, on x86, it is the same. So like @cubbi said, we need a non-Intel CPU to test. — HCSF, Jun 24 '19 at 06:06
@HCSF Even if the same asm code is generated for diff orders (or diff language constructs in general), that doesn't make them synonym in the high level PL. The compiler might reorder instructions producing SE (side effects) itself. — curiousguy, Dec 09 '19 at 02:18

score 0 · Answer 1 · answered Apr 10 '19 at 12:06

c++11 standard defines seq_cst as

There shall be a single total order S on all memory_order_seq_cst operations, consistent with the “happens before” order and modification orders for all affected locations ...

Note that these operation might be on different memory locations. In case of acq_rel ordering the order is maintained only on the memory location concerned.

In the example, you have provided, I am assuming when you are saying "under memory_order_acq_rel", you mean to say all the stores and loads in the program have memory_order_acq_rel (same for memory_order_seq_cst). The ordering requirement of memory_order_seq_cst will enforce a total order on all the loads/store operation of x and y, which means either x=true; happens-before y=true or y=true happens-before x=true. So the possible values of z are 1 or 2. While, in case of memory_order_acq_rel, if values of x and y loaded in thread c are 1 and 0 respectively, doesn't mean that thread d can't read y as 1 and x as 0. In which case value of z will be 0. So this allows values of z to be 0, 1 or 2.

How does global ordering and per atomic variable ordering differ?

1 Answers1