why doesn't gcc atomic store with __ATOMIC_RELEASE generate a memory barrier?

Question

Pikus' "The Art of Writing Efficient Programs" (pp.208) provides a spinlock implementation using C++ atomic variables. I modified this for gcc atomic builtins (https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html). I was surprised to see that my unlock function is turned into code without a memory barrier:

spin_unlock:    
# spinlock.c:112:   __atomic_store_n(s, 0, __ATOMIC_RELEASE);
    mov DWORD PTR [rdi], 0  #,* s,
    ret

I thought that memory barriers are only effective if applied pairwise. The lock function (using an atomic exchange with __ATOMIC_ACQUIRE) relies on the two implicit memory barriers in xchg, but I would have expected that gcc (on x86_64) generates a lock mov or another xchg for a store with __ATOMIC_RELEASE. Why not?

Every load and store on x86-64 is already strong enough for acq/rel. `exchange(ACQUIRE)` doesn't need the extra ordering that comes with the implicit `lock` prefix, it only needs the RMW atomicity. x86 doesn't have a way to do atomic RMWs weaker than `SEQ_CST`. You only want `xchg` for a `SEQ_CST` store, otherwise pure-load and pure-store are trivial. — Peter Cordes, May 29 '23 at 10:34

why doesn't gcc atomic store with __ATOMIC_RELEASE generate a memory barrier?

0 Answers0