C++ memory_order_acquire/release questions

Question

I recently learn about c++ six memory orders, I felt very confusing about memory_order_acquire and memory_order_release, here is an example from cpp:

#include <thread>
#include <atomic>
#include <cassert>
 
std::atomic<bool> x = {false};
std::atomic<bool> y = {false};
std::atomic<int> z = {0};
 
void write_x() { x.store(true, std::memory_order_seq_cst); }
void write_y() { y.store(true, std::memory_order_seq_cst); }
 
void read_x_then_y() {
     while (!x.load(std::memory_order_seq_cst));

     if (y.load(std::memory_order_seq_cst)) 
         ++z;
}
 
void read_y_then_x() {
     while (!y.load(std::memory_order_seq_cst));

     if (x.load(std::memory_order_seq_cst))
        ++z;
}
 
int main() {
    std::thread a(write_x);
    std::thread b(write_y);
    std::thread c(read_x_then_y);
    std::thread d(read_y_then_x);

    a.join(); b.join(); c.join(); d.join();

    assert(z.load() != 0);  // will never happen
}

In the cpp reference page, it says:

This example demonstrates a situation where sequential ordering is necessary.

Any other ordering may trigger the assert because it would be possible for the threads c and d to observe changes to the atomics x and y in opposite order.

So my question is why memory_order_acquire and memory_order_release can not be used here? And what semantics does memory_order_acquire and memory_order_release provide?

some references: https://en.cppreference.com/w/cpp/atomic/memory_order https://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync

Some references: https://en.cppreference.com/w/cpp/atomic/memory_order — KamilCuk, Aug 21 '20 at 09:50
Note that StoreLoad reordering without seq_cst is an easier case to demonstrate than IRIW; even x86 with its strongly-ordered memory model can demonstrate StoreLoad in real life (https://preshing.com/20120515/memory-reordering-caught-in-the-act/). In practice it's still rarely necessary. — Peter Cordes, Aug 21 '20 at 10:42

mpoeter · Answer 1 · 2020-08-21T10:48:18.903

Sequential consistency provides a single total order of all sequentially consistent operations. So if you have a sequentially consistent store in thread A, and a sequentially consistent load in thread B, and the store is ordered before the load (in said single total order), then B observes the value stored by A. So basically sequential consistency guarantees that the store is "immediately visible" to other threads. A release store does not provide this guarantee.

As Peter Cordes pointed out correctly, the term "immediately visible" is rather imprecise. The "visibility" stems from the fact that all seq-cst operations are totally ordered, and all threads observe that order. Since the store and the load are totally ordered, the value of a store becomes visible before a subsequent load (in the single total order) is executed.

There exists no such total order between acquire/release operations in different threads, so there is not visibility guarantee. The operations are only ordered once an acquire-operations observes the value from a release-operation, but there is no guarantee when the value of the release-operation becomes visible to the thread performing the acquire-operation.

Let's consider what would happen if we were to use acquire/release in this example:

void write_x() { x.store(true, std::memory_order_release); }
void write_y() { y.store(true, std::memory_order_release); }
 
void read_x_then_y() {
     while (!x.load(std::memory_order_acquire));

     if (y.load(std::memory_order_acquire)) 
         ++z;
}
 
void read_y_then_x() {
     while (!y.load(std::memory_order_acquire));

     if (x.load(std::memory_order_acquire))
        ++z;
}
 
int main() {
    std::thread a(write_x);
    std::thread b(write_y);
    std::thread c(read_x_then_y);
    std::thread d(read_y_then_x);

    a.join(); b.join(); c.join(); d.join();

    assert(z.load() != 0);  // can actually happen!!
}

Since we have no guarantee about visibility, it could happen that thread c observes x == true and y == false, while at the same time thread d could observe y == true and x == false. So neither thread would increment z and the assertion would fire.

For more details about the C++ memory model I can recommend this paper which I have co-authored: Memory Models for C/C++ Programmers

@PeterCordes you are right, but in my experience people who are just starting to try to wrap your head around this have an easier time with simpler wording. "already visible before the load in the total order of seq_cst operations" is definitely more precise, but also more abstract, and you have to understand the concept of the total order of seq-cst operations. However, I have updated my answer to clarify this. — mpoeter, Aug 21 '20 at 10:44
(rewrote my first comment because this is about [IRIW](https://stackoverflow.com/a/50679223/224132), not [StoreLoad](https://preshing.com/20120515/memory-reordering-caught-in-the-act/) reordering). *immediately visible* is an oxymoron in multithreading. It also isn't obviously relevant to IRIW reordering; we're not looking for StoreLoad where a release store followed by a load reads a value before the store is visible to other threads. "Immediate" visibility doesn't obviously imply a total order, which is all that matters for IRIW. — Peter Cordes, Aug 21 '20 at 10:46
Sorry for the confusion, I just guessed without really looking at the code it was going to be StoreLoad based on making a point about "immediately". Re: your update: there is no "subsequent load" in the threads doing the stores >.< I see you caught that during the 5min edit window, looks good now. — Peter Cordes, Aug 21 '20 at 10:47
@PeterCordes the "subsequent load" refers to a subsequent load in the total order. Tried to clarify a bit more. — mpoeter, Aug 21 '20 at 10:52

ALX23z · Answer 2 · 2020-08-21T09:33:04.397

You can use aquire/release when passing information from one thread to another - this is the most common situation. No need for sequential requirements on this one.

In this example there are a bunch of threads. Two threads make write operation while third roughly tests whether x was ready before y and fourth tests whether y was ready before x. Theoretically one thread may observe that x was modified before y while another sees that y was modified before x. Not entirely sure how likely it is. This is an uncommon usecase.

Edit: you can visualize the example: assume that each threads is run on a different PC and they communicate via a network. Each pair of PCs has a different ping to each other. Here it is easy to make an example where it is unclear which event occurred first x or y as each PC will see the events occur in different order.

I am not sure on sure on which architectures this effect may occur but there are complex ones where two different processors are conjoined. Surely communication between the processors is slower than between cores of each processor.

Yes, this is a case of IRIW reordering (Independent writers, independent readers). It's possible without seq_cst, but [only happens in real life on PowerPC](https://stackoverflow.com/questions/27807118/will-two-atomic-writes-to-different-locations-in-different-threads/50679223#50679223). And that's because of store-forwarding between SMT threads on the same physical core that makes stores visible before they're *globally* visible. Nothing at all to do with multi-core vs. multi-socket; cache is coherent so stores can't commit to L1d cache until all other cores have invalidated their copy. — Peter Cordes, Aug 21 '20 at 10:08

C++ memory_order_acquire/release questions

2 Answers2

Linked