I've been trying to wrap my head around atomics in C++, and while trying to understand them, I noticed something with a simple example that doesn't make sense to me.
Compiling the sample C++ code...:
#include <atomic>
std::atomic<int> a, b, c;
void variable_release() {
b.store(123, std::memory_order_relaxed);
c.store(2, std::memory_order_relaxed);
a.store(1, std::memory_order_release); // release
b.store(456, std::memory_order_relaxed);
}
void fence_release() {
b.store(123, std::memory_order_relaxed);
c.store(2, std::memory_order_relaxed);
atomic_thread_fence(std::memory_order_release); // release
a.store(1, std::memory_order_relaxed);
b.store(456, std::memory_order_relaxed);
}
... results in essentially identical assembly for both GCC (11.2) and Clang (13.0.0), when compiled with -O3 -march=native
:
variable_release():
mov DWORD PTR b[rip], 123
mov DWORD PTR c[rip], 2
mov DWORD PTR a[rip], 1
mov DWORD PTR b[rip], 456
ret
fence_release():
mov DWORD PTR b[rip], 123
mov DWORD PTR c[rip], 2
mov DWORD PTR a[rip], 1
mov DWORD PTR b[rip], 456
ret
c:
.zero 4
b:
.zero 4
a:
.zero 4
I can understand the assembly generation for fence_release
, due to the following quote from the cppreference:
an
atomic_thread_fence
withmemory_order_release
ordering prevents all preceding writes from moving past all subsequent stores.
This would seem to imply that because c
is written to after b.store(123, ...)
, the second store to b
can't be reordered above c.store(...)
, and thus there is no way to avoid the first b.store(123, ...)
altogether.
However, I don't understand why the first store to b
still occurs in variable_release
. I would expect that the second store to b
can freely move up, above the store to c
, and thus the redundant store could be eliminated.
Can the write b.store(123, ...)
be elided in either of these functions? Am I misunderstanding something or is this a missing compiler optimization?