return *(uint32_t*)(m_head_memory_location + offset);
You cast to non-atomic
non-volatile
uint32_t*
and dereference!!!
The compiler is allowed to assume that this uint32_t
object isn't written by anything else (i.e. assume no data-race UB), so it can and will hoist the load out of the loop, effectively transforming it into something like if((val=load) == 0) infinite_loop();
.
A GCC memory barrier will force a reload, but this is an implementation detail for std::atomic_thread_fence(std::memory_order_acquire)
. For x86, that barrier only needs to block compile-time reordering, so a typical implementation for GCC might be asm("" ::: "memory")
.
It's not the acquire ordering that's doing anything, it's the memory clobber that stops GCC from assuming another read will read the same thing. That's not something ISO C++ std::atomic_thread_fence(std::memory_order_acquire)
implies for non-atomic variables. (And it's always implied for atomic and volatile). So like I said, this would work in GCC but only as an implementation detail.
It's also strict-aliasing UB if this memory is ever accessed with other types than this an char*
, or if the underlying memory was declared as a char[]
array. If you got a char*
from mmap
or something then you're fine.
It's also possible misalignment UB unless offset
is known to be a multiple of 4. (Although unless GCC chooses to auto-vectorize, this won't bite you in practice on x86.)
You can solve these two for GNU C with typedef uint32_t unaligned_u32 __attribute((may_alias, aligned(1)));
but you still need volatile
or atomic<T>
for reading in a loop to work.
In general
Use std::atomic_thread_fence(std::memory_order_acquire);
as required by the C++ memory model; that's what governs reordering at compile time.
When compiling for x86, it won't turn into any asm instructions; in asm it's a no-op. But if you don't tell the compiler it can't reorder something, your code might break depending on compiler optimization level.
You might get lucky and have the compiler do a non-atomic load after an atomic mo_relaxed
load, or it might do the non-atomic load earlier if you don't tell it not to.