No, barriers alone aren't sufficient. You need to use volatile
as well for access to shared data if you want to roll your own atomics the way the Linux kernel does, relying on behaviour of a few known compilers (instead of portable ISO C stuff like <stdatomic.h>
).
In GNU C, a compiler memory barrier like asm("" ::: "memory")
can force the compiler to make asm that accesses a non-volatile
int
again instead of just keeping its value in a register. So for example you might think you could put that inside the while (b != 1) asm(""::"memory")
spin loop to force re-reads of b
, unlike your broken code which will optimize like if(b!=1){ while(true){} }
But that's not the only thing you have to worry about: without volatile
, the compiler is allowed to invent reads. e.g. if you do int tmp = shared_var;
and then use tmp
multiple times, the compiler might decide that it's cheaper to just reload the shared_var
again for one of the later uses. So your program might act like tmp
changed value, leading to inconsistent behaviour.
See the LWN article Who's afraid of a big bad optimizing compiler? for that and many more possible problems; it explains why Linux code needs WRITE_ONCE
/ READ_ONCE
or ACCESS_ONCE
macros to do a volatile
access to a shared variable. If you're rolling your own version of that, you don't need the variable itself to be declared volatile
; it's sufficient to do *(volatile int*)&b = 1;
. (That lets you efficiently access it in phases of your program where it's not shared. Or for a struct, when an instance of that struct isn't shared.)
GCC and clang do at least de-facto define the behaviour of volatile
for multi-threading use-cases (to something like _Atomic
with memory_order_relaxed
), to support the Linux kernel and legacy code written before C++11 / C11 stdatomic gave us a portable well-defined way to do all this stuff. Normally you should just use stdatomic.h
functions in C, or the GNU C __atomic
builtins. (Or the obsolete __sync
builtins if you insist). But volatile
does still work on many known implementations, like GCC, clang, ICC, and MSVC, if you know exactly what you're doing and get everything right.
volatile
makes GCC do the access with a single full-width access if it can, e.g. for the example of non-atomic stp
of int64_t
on AArch64 with a constant where high half = low half, GCC generates the full 64-bit value in one register if you use volatile
, so you get atomicity when it can happen for free with a type of that width. (It doesn't go out of its way to do 64-bit atomicity on 32-bit targets, though, unlike __atomic_load_n
which will use SSE2 or MMX to do a 64-bit load on 32-bit x86.)
ISO C doesn't guarantee anything about volatile
; data-race UB still applies to it. But all real-world systems where we run multiple threads of the same process on multiple cores have cache-coherent shared memory between those cores, so volatile
forcing a load or store to happen in the asm does give visibility. No run-time ordering, though.
BTW, __sync_synchronize
is a very expensive definition for a read-memory-barrier or write-memory-barrier (acquire/release fences that don't need to block StoreLoad reordering). __sync_synchronize
is actually a full barrier, like atomic_thread_fence(memory_order_seq_cst);