why __ATOMIC_SEQ_CST doesn't avoid CPU reordering?

Question

According to GCC definition: https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

__ATOMIC_SEQ_CST Enforces total ordering with all other __ATOMIC_SEQ_CST operations.

now let's look on the following example:

int test(int* num) 
{
int m = __atomic_load_n(num, __ATOMIC_SEQ_CST);
int z = __atomic_load_n(num, __ATOMIC_SEQ_CST);
}

compiling with x86-64 gcc 12.2 produce the following assembly using this site https://godbolt.org/

test:

    push    rbp
    mov     rbp, rsp
    mov     QWORD PTR [rbp-24], rdi
    mov     rax, QWORD PTR [rbp-24]
    mov     eax, DWORD PTR [rax]
    mov     DWORD PTR [rbp-4], eax
    mov     rax, QWORD PTR [rbp-24]
    mov     eax, DWORD PTR [rax]
    mov     DWORD PTR [rbp-8], eax
    nop
    pop     rbp
    ret

now my question is, what prevents CPU reordering between these two atomics operations?

Naturally aligned integer assignment is always atomic on x86. See https://stackoverflow.com/questions/36624881/why-is-integer-assignment-on-a-naturally-aligned-variable-atomic-on-x86/36685056 — Emanuel P, Feb 22 '23 at 13:11
@EmanuelP its atomic, but what prevent cpu reordering, meaning first load to z and then load to m — Moshe Levy, Feb 22 '23 at 13:35
I linked an equivalent question. The x86 architectural memory model promises that LoadLoad reordering does not occur. — Nate Eldredge, Feb 22 '23 at 15:02
Also, I don't think any architecture can reorder loads and stores of *the same variable*. The issue is about operations on different variables. The only reordering that x86 can do is StoreLoad, where a store to `x` is reordered with a later load from `y`. If you write that in C, you will find the compiler inserting a barrier instruction of some kind (`mfence`, or a `lock`ed RMW which happens to be a full barrier on x86). — Nate Eldredge, Feb 22 '23 at 15:04
@NateEldredge: GCC doesn't optimize between atomic operations. `x = 1;` will compile to `xchg` (equivalent to `mov` + `mfence`), regardless of whether there are any other atomic operations before an `_exit` system call. (Anything else could potentially do an atomic SC load in another function.) As you say, it's only necessary to prevent StoreLoad reordering between SC stores and SC loads, not between weaker stores and SC loads, so https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html only requires extra barriers with pure stores on x86. — Peter Cordes, Feb 22 '23 at 18:38
@NateEldredge I wrote code that does normal store and then we do atomic_load and I don't see any memory order enforcement int m = 0; int main() { int y; m = 1; __atomic_load_n(&y, __ATOMIC_SEQ_CST); } we have store load here, so what prevent the load to y to be before the write to m? — Moshe Levy, Mar 02 '23 at 10:26
it looks like that __ATOMIC_SEQ_CST doesn't guarantee StoreLoad reordering enforcement. — Moshe Levy, Mar 02 '23 at 12:09
@MosheLevy: The C memory model only forbids StoreLoad reordering if they are *both* seq_cst, and the GCC atomics are the same. GCC typically implements this by *following* every seq_cst store with a barrier. In your example the CPU might reorder the store and the load, but since the store isn't an atomic seq_cst store, this is allowed. — Nate Eldredge, Mar 02 '23 at 13:59

why __ATOMIC_SEQ_CST doesn't avoid CPU reordering?

0 Answers0