Pikus' "The Art of Writing Efficient Programs" (pp.208) provides a spinlock implementation using C++ atomic variables. I modified this for gcc atomic builtins (https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html). I was surprised to see that my unlock function is turned into code without a memory barrier:
spin_unlock:
# spinlock.c:112: __atomic_store_n(s, 0, __ATOMIC_RELEASE);
mov DWORD PTR [rdi], 0 #,* s,
ret
I thought that memory barriers are only effective if applied pairwise. The lock function (using an atomic exchange with __ATOMIC_ACQUIRE
) relies on the two implicit memory barriers in xchg
, but I would have expected that gcc
(on x86_64
) generates a lock mov
or another xchg
for a store with __ATOMIC_RELEASE
. Why not?