(Here, by critical section, I mean any synchronization mechanism that prevents concurrent access to some resource.)
It seems like the consensus on the web is that you only need acquire semantics when entering a critical section and release semantics when leaving it. But doesn't this open up the possibility of a deadlock?
Here is some pseudo-code to explain what I mean. Here is the original code:
Thread 1:
enter A // acquire semantics
// ... some work within A
leave A // release semantics
enter B // acquire semantics
// ... some work within B
leave B // release semantics
Thread 2:
enter B // acquire semantics
// ... some work within B
leave B // release semantics
enter A // acquire semantics
// ... some work within A
leave A // release semantics
When executing this code, the CPU could legally transform it into this (nothing moves in front of acquires, nothing moves behind releases):
Thread 1:
enter A // acquire semantics
enter B // acquire semantics
// ... some work within A
// ... some work within B
leave A // release semantics
leave B // release semantics
Thread 2:
enter B // acquire semantics
enter A // acquire semantics
// ... some work within B
// ... some work within A
leave B // release semantics
leave A // release semantics
But now, we have a deadlock hazard which wasn't here before! Two threads are entering more than one critical section, but in a different order.
So don't critical sections need to prevent store/load reordering as well? I.e. don't they need sequentially consistent semantics instead of just acquire/release? Why is this not specified