C++11 memory_order_acquire and memory_order_release semantics?

Question

http://en.cppreference.com/w/cpp/atomic/memory_order, and other C++11 online references, define memory_order_acquire and memory_order_release as:

Acquire operation: no reads in the current thread can be reordered before this load.
Release operation: no writes in the current thread can be reordered after this store.

This seems to allow post-acquire-writes to be executed before the acquire operation, which seems weird too me (usual acquire/release operation semantics restrict movement of all memory operations).

Same online source (http://en.cppreference.com/w/cpp/atomic/atomic_flag) suggests that a spinlock mutex can be built using C++ atomics and the above mentioned relaxed memory ordering rules:

lock mutex: while (lock.test_and_set(std::memory_order_acquire))

unlock mutex: lock.clear(std::memory_order_release);

With this definition of lock/unlock, wouldn't the simple code below be broken if memory_order_acquire/release are indeed defined this way (i.e., not forbidding reordering of post-acquire-writes):

Thread1:
  (0) lock
    (1) x = 1;
    (2) if (x != 1) PANIC
  (3) unlock

Thread2:
  (4) lock
    (5) x = 0;
  (6) unlock

Is the following execution possible: (0) lock, (1) x = 1, (5) x = 0, (2) PANIC ? What did I miss?

How do you think this is possible? What is the precise order of events (including the locks and unlocks) that you imagine? — David Schwartz, Apr 23 '13 at 22:21
I've added lock in the trace above. I imagine that the post-acquire-write at (5) can be executed before (4). — Cedomir Segulja, Apr 23 '13 at 22:27
`release` means "I'm done now and here is the indicator" and `acquire` means "are you done? look at the indicator" — curiousguy, May 28 '19 at 01:13
You missed that `test_and_set` is a Read-Modify-Write operation, for which there are special rules that you haven't taken into account. — Carlo Wood, Sep 07 '19 at 20:47
@CarloWood "_for which there are special rules_" Which rules? — curiousguy, Dec 18 '19 at 07:53
For the record, https://en.cppreference.com/w/cpp/atomic/memory_order wording is fixed now. IDK how long after this question was asked. — Peter Cordes, Dec 18 '19 at 14:31
@curiousguy So, search for the string 'read-modify-write' on the page that Peter just mentioned ;). — Carlo Wood, Dec 20 '19 at 16:39
@CarloWood "_`test_and_set` ... for which there are special rules_" I don't see what additional guarantee exists for `test_and_set` that would contribute to that program — curiousguy, Dec 20 '19 at 18:56

score 27 · Accepted Answer · edited Aug 23 '20 at 05:37

The spinlock mutex implementation looks okay to me. I think they got the definitions of acquire and release completely wrong.

Here is the clearest explanation of acquire/release consistency models that I am aware of: Gharachorloo; Lenoski; Laudon; Gibbons; Gupta; Hennessy: Memory consistency and event ordering in scalable shared-memory multiprocessors, Int'l Symp Comp Arch, ISCA(17):15-26, 1990, doi 10.1145/325096.325102. (The doi is behind the ACM paywall. The actual link is to a copy not behind a paywall.)

Look at Condition 3.1 in Section 3.3 and the accompanying Figure 3:

before an ordinary load or store access is allowed to perform with respect to any other processor, all previous acquire accesses must be performed, and
before a release access is allowed to perform with respect to any other processor, all previous ordinary load and store accesses must be performed, and
special accesses are [sequentially] consistent with respect to one another.

The point is this: acquires and releases are sequentially consistent¹ (all threads globally agree on the order in which acquires and releases happened.) All threads globally agree that the stuff that happens between an acquire and a release on a specific thread happened between the acquire and release. But normal loads and stores after a release are allowed to be moved (either by hardware or the compiler) above the release, and normal loads and stores before an acquire are allowed to be moved (either by hardware or the compiler) to after the acquire.

(Footnote 1: This is true for most implementations, but an overstatement for ISO C++ in general. Reader threads are allowed to disagree about the order of 2 stores done by 2 other threads. See Acquire/release semantics with 4 threads, and this answer for details of how C++ compiled for POWER CPUs demonstrates the difference in practice with release and acquire, but not seq_cst. But most CPUs do only get data between cores via coherent cache that means a global order does exist.)

In the C++ standard (I used the link to the Jan 2012 draft) the relevant section is 1.10 (pages 11 through 14).

The definition of happens-before is intended to be modeled after Lamport; Time, Clocks, and the Ordering of Events in a Distributed System, CACM, 21(7):558-565, Jul 1978. C++ acquires correspond to Lamport's receives, C++ releases correspond to Lamport's sends. Lamport placed a total order on the sequence of events within a single thread, where C++ has to allow a partial order (see Section 1.9, Paragraphs 13-15, page 10 for the C++ definition of sequenced-before.) Still, the sequenced-before ordering is pretty much what you would expect. Statements are sequenced in the order they are given in the program. Section 1.9, paragraph 14: "Every value computation and side eﬀect associated with a full-expression is sequenced before every value computation and side eﬀect associated with the next full-expression to be evaluated."

The whole point of Section 1.10 is to say that a program that is data-race-free produces the same well defined value as if the program were run on a machine with a sequentially consistent memory and no compiler reordering. If there is a data race then the program has no defined semantics at all. If there is no data race then the compiler (or machine) is permitted to reorder operations that don't contribute to the illusion of sequential consistency.

Section 1.10, Paragraph 21 (page 14) says: A program is not data-race-free if there is a pair of accesses A and B from different threads to object X, at least one of those accesses has a side effect, and neither A happens-before B, nor B happens-before A. Otherwise the program is data-race-free.

Paragraphs 6-20 give a very careful definition of the happens-before relation. The key definition is Paragraph 12:

"An evaluation A happens before an evaluation B if:

A is sequenced before B, or
A inter-thread happens before B."

So if an acquire is sequenced before (in the same thread) pretty much any other statement, then the acquire must appear to happen before that statement. (Including if that statement performs a write.)

Likewise: if pretty much any statement is sequenced before (in the same thread) a release, then that statement must appear to happen before the release. (Including if that statement just does a value computation (read).)

The reason that the compiler is allowed to move other computations from after a release to before a release (or from before an acquire to after an acquire) is because of the fact that those operations specifically do not have an inter-thread happens before relationship (because they are outside the critical section). If they race the semantics are undefined, and if they don't race (because they aren't shared) then you can't tell exactly when they happened with regard to the synchronization.

Which is a very long way of saying: cppreference.com's definitions of acquire and release are dead wrong. Your example program has no data race condition, and PANIC can not occur.

Thanks. I'm familiar with the paper - this is the source of my confusion: acquire access/operation semantics usually guarantee that both loads and stores that are after (in program order) do indeed appear as executed after the acquire op. cppreference however seems to allow writes to appear as executed before. — Cedomir Segulja, Apr 24 '13 at 02:02
I'm still not entirely convinced though. The modifications to `x` are not to an atomic object, and therefore are not "visible side effects." As a result I think a compiler could reorder or remove operations to x due to the "as-if rule" 1.9/1. (Although in this particular example it is more likely that the compiler would optimize out x and the call to PANIC entirely given that constant propagating x into the `if` test allows removal of the `PANIC` call entirely, and the store in thread 2 has no side effects. — Billy ONeal, Apr 24 '13 at 05:07
Modifications to `x` are visible side effects. It's just a question of _when_ the side effects become visible to other threads. I think of it as: Release operations "broadcast" the visibility of non-atomic side-effects to other threads. Acquire operations "receive" the visibility of those non-atomic side effects. So when @CedomirSegulja's Thread 1 performs (0) lock it now must believe that either Thread 2 has not yet reached (4) or that Thread 2 has passed (6). — Wandering Logic, Apr 24 '13 at 11:30
I'll rewrite that cppreference page (because I'm guilty of initially writing it based on someone's blog notes). — Cubbi, Apr 24 '13 at 14:43
@Cubbi: I think all formulations of type "No reads/writes in the reader/writer thread can be reordered before/after the atomic load/store" should be removed. First, I think they are wrong (e.g. there could be reads in the reader that could be reorder before the atomic load without breaking the acq-rel semantics). Second, as a programmer, I don't care what reordering compiler/hardware may do to boost performance, I care about the fact that statements (sequenced-)before relase in threadA will appear to be executed before the statements that are (sequenced-)after the acquire done by threadB. — Cedomir Segulja, Apr 24 '13 at 16:05
Ok, I wouldn't say I don't care about optimizations :), just that I first need to know the semantics and only then what are the optimizations allowed under these semantics. — Cedomir Segulja, Apr 24 '13 at 16:07
@Cedomir Segulja I agree now, and I think reorders have their place when talking about why certain asserts may fail (e.g. 29.3/10, which where the standard brings them up, too). — Cubbi, Apr 24 '13 at 17:45
@CedomirSegulja F.ex. the compiler can see that automatic variable whose address is never made available to any other function (those that usually can be allocated to registers) cannot be examined by other threads and can always reorder those accesses around mutex operations, atomic operations and fences. — curiousguy, May 28 '19 at 12:36
@BillyONeal "_The modifications to x are not to an atomic object, and therefore are not "visible side effects."_" Are you saying that actions on atomics are visible side effects? "_As a result I think a compiler could reorder or remove operations to x due to the "as-if rule" 1.9/1_" Are you saying that actions on atomics are not subject to the so called "as-if rule"? (There is no as-if rule.) — curiousguy, May 28 '19 at 12:41
*The point is this: acquires and releases are sequentially consistent (all threads globally agree on the order in which acquires and releases happened.)* - That's true on most implementations but *not* guaranteed by ISO C++ unless you use `seq_cst`, not just acquire and release. [Will two atomic writes to different locations in different threads always be seen in the same order by other threads?](https://stackoverflow.com/a/50679223) explains how it can happen in practice on POWER / PowerPC. — Peter Cordes, Aug 23 '20 at 05:30

C++11 memory_order_acquire and memory_order_release semantics?

1 Answers1

Linked