Can acquire loads reorder with other acquire operations? cppreference says only non-atomic and relaxed are ordered by acquire

Question

According to C++ Reference, mutex.lock() is a memory_order_acquire operation, and mutex.unlock() is a memory_order_release operation.

However, memory_order_acquire and memory_order_release are only effective for non-atomic and relaxed atomic operations.

memory_order: Release-Acquire ordering on cppreference

If an atomic store in thread A is tagged memory_order_release and an atomic load in thread B from the same variable is tagged memory_order_acquire, all memory writes (non-atomic and relaxed atomic) that happened-before the atomic store from the point of view of thread A, become visible side-effects in thread B

Could mutex in C++ guarantee the visibility of atomic operations? An example is as follows. Can the code A reorder before the mu.lock(), and the thread b read x as false?


#include <thread>
#include <atomic>
#include <cassert>
#include <iostream>
#include <unistd.h>

std::atomic<bool> x = {false};
std::mutex mu;

void write_x(){
  mu.lock();
  std::cout << "write_x" << std::endl;
  x.store(true, std::memory_order_release);
  mu.unlock();
}

void read_x() {
  mu.lock();
  std::cout << "read_x" << std::endl;
  assert(x.load(std::memory_order_acquire)); // A
  mu.unlock();
}

int main() {
  std::thread a(write_x);
  usleep(1);
  std::thread b(read_x);

  a.join(); b.join();

  return 0;
}

*However, memory_order_acquire and memory_order_release are only effective for non-atomic variables.* - no, they order atomic operations as well, including relaxed. — Peter Cordes, Aug 23 '20 at 04:31
Yes, C++ Reference says _they order non-atomic and relaxed atomic operations_, but they don't order atomic operations. And therefore, could the code `A` read x as false? — Zihe Liu, Aug 23 '20 at 04:37
No, of course it can't reorder before an acquire operation. Synchronizes-with orders *all* operations, including other acquire and release operations, not just seq_cst, relaxed, or non-atomic. But your code could still fire the assert (at least in theory) if thread `a` is slow to start up and thread `b` takes the lock `mu` first. — Peter Cordes, Aug 23 '20 at 04:47
@PeterCordes C++ Reference says _If an atomic store in thread A is tagged memory_order_release and an atomic load in thread B from the same variable is tagged memory_order_acquire, all memory writes (non-atomic and relaxed atomic) that happened-before the atomic store from the point of view of thread A, become visible side-effects in thread B._ According to this, storing false to `x` in `a` may be not visible for loading `x` in `b`, because it is atomic_order_acquire, not non-atomic or atomic_order_relaxed. I am new to concurrent C++, and maybe misunderstand sth. — Zihe Liu, Aug 23 '20 at 05:06
I think when cpp-reference says "relaxed atomic" then mean any order more relaxed than seq_cst. It's a confusing way to phrase it, but fairly obvious from context or if you check the actual ISO C++ standard. This is somewhat common usage, although it's very confusing because of `std::memory_order_relaxed` and should be avoided. — Peter Cordes, Aug 23 '20 at 05:06
@PeterCordes Oh, it is because I misunderstand "relaxed atomic" in cpp-reference. Thanks so much! — Zihe Liu, Aug 23 '20 at 05:18
@PeterCordes Uh, I suddenly encountered a new problem. If `x` is loaded and stored by `std::memory_order_seq_cst` and `a` gets lock before `b`, could `b` load `x` as false? Thanks very much. — Zihe Liu, Aug 23 '20 at 05:24
Yes, I already mentioned that possibility in the 2nd half of [this comment](https://stackoverflow.com/questions/63543388/can-mutex-in-c-guarantee-the-visibility-of-atomic-operations?noredirect=1#comment112363673_63543388). That problem exists regardless of what ordering you use on `x`, or whether it's non-atomic. Spin-waiting for the other thread to have set a data-ready flag (e.g. using an acquire load) is the simple inefficient way to write examples like that. — Peter Cordes, Aug 23 '20 at 05:29
@PeterCordes Uh, if `a` takes lock `mu` first and `x` is stored and loaded by non-atomic or relaxed atomic ordering, then `b` will load `x` as true. If `a` takes lock `mu` first and `x` is stored and loaded by seq_cst ordering, then `b` may load `x` as false. Are these statements correct? — Zihe Liu, Aug 23 '20 at 05:39
Oh, I misread your earlier comment. No, your second statement is false. `seq_cst` is ordered wrt a release operation, just like everything else ( https://preshing.com/20120913/acquire-and-release-semantics/). That wording on cppreference is really bad, but they maybe think they don't need to mention seq_cst because it already orders *itself* wrt. everything else. Again, this should be obvious because it would be insane if it were otherwise, and obviously violate the way ISO C++ defines things in terms of Sychronizes With. — Peter Cordes, Aug 23 '20 at 05:44
Oh, I see. _non-atomic and relaxed atomic_ is for emphasizing that release-acquire can make them be visible, not to exclude `seq_cst`. Thanks very mush for answering so many questions. — Zihe Liu, Aug 23 '20 at 06:07
@PeterCordes "Pure" `seq_cst` ops (= the ops that are not acq+rel (= the ops that are both acq *and* rel)) are *weakly ordered*. The strongest ordering is provided by real acq+rel ops (that are by definition RMW). — curiousguy, Aug 31 '20 at 16:40

Peter Cordes · Accepted Answer · 2020-08-23T06:56:28.413

TL:DR: "all memory writes" means all, not just the kinds mentions, but the phrasing is confusing. Probably intended just to point out that even non-atomic and relaxed atomic ops are safely visible across a synchronizes-with, but the phrasing is missing the word "including".

Note that cppreference is a wiki that's intended to explain the standard. It's not normative technical language, and sometimes even explains things in different terms than the ISO C++ standard.

It's generally very good, but don't just assume that it's perfect when something seems strange. From surrounding context (and sanity), like the last sentence in the paragraph saying "everything" with no qualifications, it's still fairly obvious that's what was meant.

ISO C++ is clear. An acquire operation that "sees" a release operation creates a synchronizes-with relationship. Everything before the release is visible to code after the acquire operation.

So in terms of a model where operations that access a globally coherent shared state of memory, acquire operations block everything from reordering before them. Including release and seq_cst operations. (Note that this part of cppreference doesn't make any reference to reordering, just to guaranteed visibility or not. Local reordering of accesses to global coherent state is in practice how real CPUs work, so it's often more convenient to describe things that way, like you're doing in the question.)

This means that C++'s definition of acquire and release matches standard terminology without insane magic exceptions. https://preshing.com/20120913/acquire-and-release-semantics/

Note that some people use "relaxed atomics" to describe all orderings weaker than seq_cst. Example: Herb Sutter uses it that way in the talk this question is about.

That might be what was meant in that cppreference definition, but IDK why they'd want to exclude seq_cst. All atomic and non-atomic operations are ordered. So perhaps they did mean mo_relaxed, and just wanted to point out that even those are ordered / visible.

(seq_cst could be said to already order itself wrt. everything else, so "of course" it's ordered with respect to acquire and release operations. But that reason seems unlikely.)

If it was intended for emphasis of the fact that weaker orders were also ordered by it, they should have written "including non-atomic and relaxed atomic". Without the word "including", that phrasing can be read as implying only non-atomic and relaxed-atomic. Only an understanding of the big picture and what would be sane or not can give you a correct reading.

Technical writing that needs to be precisely understood will often use the phrase "including but not limited to".

Also note that your example can still trigger the assert, just not for the reason you were worried about.

If thread a is slow to start up, thread b could enter its critical section first and print + read x before the print+store in the other thread happens.

The usual way to write toy examples like that is a loop that spins on an acquire load until it sees a value, e.g. a flag like data_read stored by a release operation after the store you care about. That way you know the read side runs after an acquire operation that synced-with a release operation in the write side.

Can acquire loads reorder with other acquire operations? cppreference says only non-atomic and relaxed are ordered by acquire

1 Answers1