0

I've been trying to wrap my head around atomics in C++, and while trying to understand them, I noticed something with a simple example that doesn't make sense to me.

Compiling the sample C++ code...:

#include <atomic>

std::atomic<int> a, b, c;

void variable_release() {
    b.store(123, std::memory_order_relaxed);
    c.store(2, std::memory_order_relaxed);
    a.store(1, std::memory_order_release); // release
    b.store(456, std::memory_order_relaxed);
}

void fence_release() {
    b.store(123, std::memory_order_relaxed);
    c.store(2, std::memory_order_relaxed);
    atomic_thread_fence(std::memory_order_release); // release
    a.store(1, std::memory_order_relaxed);
    b.store(456, std::memory_order_relaxed);
}

... results in essentially identical assembly for both GCC (11.2) and Clang (13.0.0), when compiled with -O3 -march=native:

variable_release():
        mov     DWORD PTR b[rip], 123
        mov     DWORD PTR c[rip], 2
        mov     DWORD PTR a[rip], 1
        mov     DWORD PTR b[rip], 456
        ret
fence_release():
        mov     DWORD PTR b[rip], 123
        mov     DWORD PTR c[rip], 2
        mov     DWORD PTR a[rip], 1
        mov     DWORD PTR b[rip], 456
        ret
c:
        .zero   4
b:
        .zero   4
a:
        .zero   4

I can understand the assembly generation for fence_release, due to the following quote from the cppreference:

an atomic_thread_fence with memory_order_release ordering prevents all preceding writes from moving past all subsequent stores.

This would seem to imply that because c is written to after b.store(123, ...), the second store to b can't be reordered above c.store(...), and thus there is no way to avoid the first b.store(123, ...) altogether.

However, I don't understand why the first store to b still occurs in variable_release. I would expect that the second store to b can freely move up, above the store to c, and thus the redundant store could be eliminated.

Can the write b.store(123, ...) be elided in either of these functions? Am I misunderstanding something or is this a missing compiler optimization?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
noshwins
  • 101
  • 4
  • 3
    Title: yes. But you won't see it at runtime on x86 where in asm every store is at least `release` (which is why the release barrier is zero instructions). And you don't see it at compile time because compilers don't optimize atomics. [Why don't compilers merge redundant std::atomic writes?](https://stackoverflow.com/q/45960387) – Peter Cordes Feb 08 '22 at 18:01
  • @PeterCordes Thank you for the clarification. – noshwins Feb 08 '22 at 18:06
  • 1
    BTW, that definition of release fence is insufficient: it also prevents LoadStore reordering. But it *does* correctly talk about blocking reordering with any later stores, rather than just one-way wrt. the fence itself, [which would *not* be strong enough](https://stackoverflow.com/a/70193424/224132). – Peter Cordes Feb 08 '22 at 18:10
  • @PeterCordes The title may be answerable by "yes", but I think the question at the end is not. If I just remove the line with the store, then it could happen that a thread acquiring `a` followed by a read of `b` sees neither of the two stores to `b`, which would not be correct. At least the optimization would need to move the second `b` store before the release. – user17732522 Feb 08 '22 at 18:11
  • @user17732522: Yes, that reordering (dead store elimination of the first store after moving the later store up next to it) is the missed-optimization the question is asking about, which the compiler isn't doing because they never optimize atomics, treating `atomic` basically like `volatile atomic` as well as the other ordering semantics. – Peter Cordes Feb 08 '22 at 18:14
  • @PeterCordes Oops, sorry. I must have skipped the line where OP already explains this in detail. – user17732522 Feb 08 '22 at 18:15
  • 1
    @user17732522: Yup, I've missed my share of details when skimming. :P I think this question has correct reasoning about everything, so the only missing piece is that compilers choose not to optimize, so it's a duplicate. – Peter Cordes Feb 08 '22 at 18:17

0 Answers0