How sequentially consistent ordering in c++ define a total order?

Question

I am reading the following example from https://en.cppreference.com/w/cpp/atomic/memory_order#Sequentially-consistent_ordering. And I have a hard time understanding

under what situation assert(z.load() != 0); will fail.
why does using memory_order_seq_cst over memory_order_ack_rel make z never be 0.

#include <thread>
#include <atomic>
#include <cassert>
 
std::atomic<bool> x = {false};
std::atomic<bool> y = {false};
std::atomic<int> z = {0};
 
void write_x()
{
    x.store(true, std::memory_order_seq_cst);
}
 
void write_y()
{
    y.store(true, std::memory_order_seq_cst);
}
 
void read_x_then_y()
{
    while (!x.load(std::memory_order_seq_cst))
        ;
    if (y.load(std::memory_order_seq_cst)) {
        ++z;
    }
}
 
void read_y_then_x()
{
    while (!y.load(std::memory_order_seq_cst))
        ;
    if (x.load(std::memory_order_seq_cst)) {
        ++z;
    }
}
 
int main()
{
    std::thread a(write_x);
    std::thread b(write_y);
    std::thread c(read_x_then_y);
    std::thread d(read_y_then_x);
    a.join(); b.join(); c.join(); d.join();
    assert(z.load() != 0);  // will never happen
}

As far as I under, for either read_x_then_y or read_y_then_x, they could observe a state among:

x = true and y = false
x = false and y = true
x = true and y = true
x = false and y = false

The first 2 cases make z = 1 (eventually 2), the third case makes z = 2, and the last case makes read_x_then_y and read_y_then_x wait until one of `x' and 'y' become true. However, according to the cppreference

This example demonstrates a situation where sequential ordering is necessary. Any other ordering may trigger the assert because it would be possible for the threads c and d to observe changes to the atomics x and y in opposite order.

I don't understand how is that possible. How would the changes to x and y be in the opposite order?

In addition, I am wondering how would the use of memory_order_seq_cst solves the problem. Is it forcing the x.load in read_x_then_y must be executed before y.load?

The question is not how changes to `x` and `y` can _happen_ in opposite order. The question is, how can one thread write them in a certain order, and another thread _observe them to change_ in the opposite order? The answer may lead you down a rabbit hole of [multi-level memory architectures](https://en.wikipedia.org/wiki/CPU_cache#Cache_hierarchy_in_a_modern_processor). — Solomon Slow, Jun 06 '23 at 05:05
It is forcing the effect of `x.load` to be visible to all other processors before the effect of `y.load`. Other orderings do not require effects to be propagated in a way that preserves order. For example, if both accesses are relaxed, then processor 2 might read x from cache, but y from memory, causing it to observe the modified y immediately, but it won't observe the modified x until its cache is invalidated. — Raymond Chen, Jun 06 '23 at 05:06
@SolomonSlow So z might be 0 when read_x_then_y is executed on one core while read_y_then_x is executed on the other core and the cache for these two cores are somehow not consistent at the moment? To be specific, at the very beginning, both x and y are false. Then x (or y) is set to true by one thread and this information is only propagated to read_x_then_y's cache (which makes read_x_then_y reads x = true and y = false), while at the moment y is set to true but this information is only propagated to read_y_then_x's cache (which makes read_y_then_x reads x = true and y = false)? — heturing, Jun 06 '23 at 18:06
@RaymondChen So this might happen when there is a cache incoherence? Can z still equal to 0 on architectures that assure cache coherence?" — heturing, Jun 06 '23 at 18:13
[The Wikipedia page for cache coherence](https://en.wikipedia.org/wiki/Cache_coherence#Definition) notes that cache coherence is weaker than sequential consistency. Cache coherence does not require separate memory locations to be temporally coherent with each other, so the problem can still occur. You need sequential consistency to ensure that the changes to `x` are observable to all processors before the changes to `y`. — Raymond Chen, Jun 06 '23 at 18:57

score 2 · Answer 1 · answered Jun 06 '23 at 18:33

I suggest reading my introduction to memory orders, which explains most of this.

How would the changes to x and y be in the opposite order?

Because by default (in absence of seq_cst) threads only agree on the order of operations on each individual atomic variable. They can disagree on how those operations are interleaved.

So if you remove seq_cst, all threads will merely agree that x and y start as false and are changed to true, but they can disagree which of them is changed first.

In addition, I am wondering how would the use of memory_order_seq_cst solves the problem.

Because for seq_cst operations, all threads agree on how they are interleaved. In other words, there's a single total order (that all threads agree on) in which all seq_cst operations happen.

So if thread read_x_then_y() sees x == true && y == false and does z++, then we've estabilished that x = true "happens before" y = true, so the thread read_y_then_x() has to agree with this observation, and can't see x == false && y == true.

If two threads can disagree on which of x and y is changed first, does it means these two threads are holding a copy of x and y in their own cache? Does "disagree" here mean there can be a moment such that one thread's cache says x = true and y = false while the other thread's cache says y = false and x = true as long as these two threads eventually make sure x = true and y = true at the end of execution? — heturing, Jun 06 '23 at 19:33
@heturing *"does it means these two threads are holding a copy of x and y in their own cache"* I *think* so, but I don't know much on how the memory orders work on the hardware level. *"Does "disagree" here mean there can be a moment such that ..."* Yep. Except that "the end of execution" doesn't hold any special significance. "Eventually" is probably a better word. — HolyBlackCat, Jun 06 '23 at 19:43

score 0 · Answer 2 · answered Jun 06 '23 at 17:30

0

For read_x_then_y:

void read_x_then_y()
{
    while (!x.load(std::memory_order_seq_cst))
        ;
    if (y.load(std::memory_order_seq_cst)) {
        ++z;
    }
}

The first loop waits until x is loaded as true. Then ++z is executed if y is loaded as true. At this point, y could be either true or false. Both executions are possible.

Likewise, for read_y_then_x, both executions are possible.

What prevents the following?

read_x_then_y loads x as true but y as false
read_y_then_x loads y as true but x as false

In this case, the assert would trigger.

The answer is that this is the guarantee sequentially consistent ordering provides. All such atomic operations share a single total modification order. Either the store to x happens first, or the store to y happens first, in this order. All loads will agree on that order.

Release stores and acquire loads provide much less: any stores before the release stores in the same thread are visible to acquire loads in other threads. So all variables stored to by write_x before the store, would be loadable by read_x_then_y after x is loaded. Of course, there are no such stores.

answered Jun 06 '23 at 17:30

Jeff Garrett

5,863
1
13
12

@JeffGarreet Thank you for your answer. Does it mean that for **read_x_then_y loads x as true but y as false**, the function will be stuck on the while loop forever? And I think the most confusing part for me is why read_x_then_y and read_y_then_x can load different values for x and y. Say x is set to true by write_x first, should not both read_x_then_y and read_y_then_x see x = true and y = false? – heturing Jun 06 '23 at 18:33
This is the root of your misunderstanding: "Say x is set to true by write_x first". With seq_cst, there is a shared ordering so "first" makes sense. Without "seq_cst", there's not, so it was seemingly written first, according to which thread? The other thread can disagree of course. – Jeff Garrett Jun 06 '23 at 18:41
To put it another way, in seq_cst, there is some ordering you can give the atomic operations to match the execution. This is probably more intuitive. Without it, each thread writes and reads independently with only the guarantees of the memory order used. In that world, different threads can disagree on the order of atomic operations. – Jeff Garrett Jun 06 '23 at 18:45
Can you elaborate more on *different threads can disagree on the order of atomic operations*? Does it mean each thread has its own x and y (read from its own L1 cache) and the x and y from one thread can have different values from the x and y in another thread at some moments during the execution as long as both threads eventually make x = ture and y = true? – heturing Jun 06 '23 at 19:03

How sequentially consistent ordering in c++ define a total order?

2 Answers2