Are acquire/release semantics really enough for implementing critical sections?

Question

(Here, by critical section, I mean any synchronization mechanism that prevents concurrent access to some resource.)

It seems like the consensus on the web is that you only need acquire semantics when entering a critical section and release semantics when leaving it. But doesn't this open up the possibility of a deadlock?

Here is some pseudo-code to explain what I mean. Here is the original code:

Thread 1:
    enter A // acquire semantics
    // ... some work within A
    leave A // release semantics

    enter B // acquire semantics
    // ... some work within B
    leave B // release semantics

Thread 2:
    enter B // acquire semantics
    // ... some work within B
    leave B // release semantics

    enter A // acquire semantics
    // ... some work within A
    leave A // release semantics

When executing this code, the CPU could legally transform it into this (nothing moves in front of acquires, nothing moves behind releases):

Thread 1:
    enter A // acquire semantics
    enter B // acquire semantics
    // ... some work within A
    // ... some work within B
    leave A // release semantics
    leave B // release semantics

Thread 2:
    enter B // acquire semantics
    enter A // acquire semantics
    // ... some work within B
    // ... some work within A
    leave B // release semantics
    leave A // release semantics

But now, we have a deadlock hazard which wasn't here before! Two threads are entering more than one critical section, but in a different order.

So don't critical sections need to prevent store/load reordering as well? I.e. don't they need sequentially consistent semantics instead of just acquire/release? Why is this not specified

Compile-time reordering isn't allowed to invent deadlocks in C++, that would violate the "finite time" requirement. [How C++ Standard prevents deadlock in spinlock mutex with memory\_order\_acquire and memory\_order\_release?](https://stackoverflow.com/q/61299704) And run-time reordering can't move a whole spin-wait loop that waits for a lock to be available; its first iteration or two might see a lock not available, but not indefinitely. — Peter Cordes, May 21 '23 at 10:38

score 1 · Answer 1 · answered Nov 03 '18 at 01:25

1

The "moves" interpretation of acquire/release is a useful guide, but can break down because it describes how a thread sees other thread's actions. Each thread must see its own actions as happening in order. For example, Thread 2 could see this sequence:

Thread 1 acquires A
Thread 2 acquires B

but subsequently Thread 2 will see itself as releasing B before acquiring A. Conversely, during the same run, Thread 1 could see it as:

Thread 2 acquires B
Thread 1 acquires A

but subsequently Thread 1 will see itself as releasing A before acquiring B.

answered Nov 03 '18 at 01:25

Arch D. Robison

3,829
2
16
26

Wouldn't this mean processors are forbidden from reordering instructions, even if it would not change the end result of a single-threaded program? Because otherwise, why would thread 1 always see itself as releasing A before taking B? Does this mean memory ordering issues only affect the order of propagation of local cache to shared memory, but not the order of instructions themselves? Obviously, I'm not an expert, I'm just trying to get a better understanding of what is and isn't possible with memory reordering in general. – relatively_random Nov 04 '18 at 09:32
1

A processor must see its own instructions execute in order. It's other processor's instructions that it might see happen out of order, and two processors might disagree on the total order in which their instructions happened. I.e. it's not safe to reason using a single global order of instructions. – Arch D. Robison Nov 04 '18 at 20:15
Then the only way I can figure out about how the processor could approach out of order execution is with a sort of queue: it enqueues the taking of the second lock as soon as the first one is acquired, but it also enqueues the rest of the work it needs to do with the first lock. So if taking the second lock fails, it pushes it back to the end of the queue, but there are still other things to do. This will eventually release the first lock. Is this interpretation sort of on the right track? Basically, that execution order isn't a static thing, but changes dynamically according to circumstances? – relatively_random Nov 05 '18 at 07:43
Most processors do not attempt out of order acquisition of a lock because the "undo" aspect is tricky. However, there is something called "speculative lock elision", implemented in some Intel processors, that has the "undo" logic. See https://en.wikipedia.org/wiki/Transactional_Synchronization_Extensions . – Arch D. Robison Nov 07 '18 at 03:49
@relatively_random: Instruction reordering isn't the only thing that can cause memory reordering (from the PoV of other threads). Stores *can't* become globally visible when they execute, because they're still speculative at that point and might have to get rolled back. Thus a store buffer. [Can a speculatively executed CPU branch contain opcodes that access RAM?](https://stackoverflow.com/q/64141366) and [Can CPU Out-of-Order-Execution cause memory reordering?](https://stackoverflow.com/q/71768672) (loads starting in order can still receive values out of order thanks to cache hit vs. miss) – Peter Cordes May 21 '23 at 10:32

score 1 · Accepted Answer · answered May 21 '23 at 10:24

Acquire and release are from the release consistency model (RC).
In RC there are 2 types of memory operations:

ordinary reads and writes
special operations, which are divided into 2 classes:
- sync (used for synchronization): acquire and release
- nsync (not used for synchronisation): the same as memory_order_relaxed in C++

This only applies to ordinary reads and writes:

nothing moves in front of acquires, nothing moves behind releases

Special operations have their own reordering rules.
There are 2 types of RC:

with sequentially consistent special operations (RCsc)
with processor consistent special operations (RCpc)

Here are the allowed reorderings according to the wiki:

For sequential consistency (RCsc), the constraints are:

acquire → all,

all → release,

special → special.

For processor consistency (RCpc) the write to read program order is relaxed, having constraints:

acquire → all,

all → release,

special → special (except when special write is followed by special read).

Note: the above notation A → B, implies that if the operation A precedes B in the program order, then program order is enforced.

In case of RCsc special operations aren't allowed to reorder, so the reordering in your example isn't legal.

In case of RCpc it's a little bit more complicated.
Entering a critical section is actually 2 special operations executed atomically:

an acquire read of a synchronization variable
a write to the same synchronization variable which marks the lock as taken and prevents other threads from entering this critical section

As a result:

entering a critical section includes the special write
leaving a critical section is a release (i.e. also a special write)

Special writes aren't allowed to reorder in RCpc, so the reordering in your example isn't legal in RCpc.

Thanks, moving the accepted mark to your answer because it explains how locks work within the model itself. I wasn't aware that special operations have their own extra constraints in the model. — relatively_random, May 21 '23 at 12:09

Are acquire/release semantics really enough for implementing critical sections?

2 Answers2