Why do relaxed atomic operations prevent compiler optimizations?

Question

C++ compilers are allowed to elide or combine allocations. However, it seems that if allocated memory is accessed with atomic operations (even with relaxed memory order) that allocation cannot be elided by GCC and Clang.

// new/delete are elided
uint64_t successfulElision() {
    auto ptr = new uint64_t{0};
    *ptr = 5;
    auto result = *ptr;
    delete ptr;
    return result;
}

// new/delete are not elided
uint64_t failedElision() {
    auto ptr = new uint64_t{0};
    atomic_ref<uint64_t> rf(*ptr);
    rf.store(5, memory_order_relaxed);
    auto result = rf.load(memory_order_relaxed);
    delete ptr;
    return result;
}

https://godbolt.org/z/sacMdbac5

What is the reason for this? Is this required by the standard?

While it's true that this is a potentially missed optimization, it's also the case that this optimization would not be productive. In real-world code, you use an atomic because the variable is shared between threads. There's no point creating an atomic and using it on only one thread. The only practical cases for this optimization are artificial examples like yours, where somebody creates an atomic yet for some strange reason doesn't share it between threads. — Raymond Chen, Jan 04 '22 at 13:01
I was looking to store a atomic in coroutine task::promise_type to store a coroutine's continuation. General wisdom is to not do this, and suspend coroutines at initial_suspend, to store continuations without requiring synchronization. However, I was not convinced that avoiding the synchronization is that valuable, as on x86 acquire and release memory operations have little to no additional cost. On the fast path, the coroutine is never suspended and sets the continuation from the same thread. On the slow path, the continuation may be set from another thread — Altan, Jan 04 '22 at 13:13
I wanted to benchmark the two approaches (continuations via atomics vs initial_suspend), but I was discouraged when I found out that using atomic operations may stop the compiler from eliding the coroutine allocations. — Altan, Jan 04 '22 at 13:21
By the way, thank you @RaymondChen for your amazing coroutine tutorials! I am trying to explore and better understand coroutines and they have been incredibly valuable. — Altan, Jan 04 '22 at 13:24
Another reason why I felt that continuations via atomics made sense is the existence of coroutine_handle::done. If I am not mistaken, coroutine_handle::done also potentially communicates state across threads via memory. If coroutine_handle::done can do it, I should be able to do it too. — Altan, Jan 04 '22 at 13:30
`coroutine_handle::done` requires that the coroutine already be suspended. This means that `done` doesn't need to do any synchronization: if the coroutine is used from multiple threads, it is the caller's responsibility to synchronize `done` against the suspend. — Raymond Chen, Jan 04 '22 at 14:09
Related: [Why don't compilers merge redundant std::atomic writes?](https://stackoverflow.com/q/45960387) — Peter Cordes, Mar 27 '23 at 22:19

score -1 · Answer 1 · answered Jan 04 '22 at 12:48

-1

You use it in some function so you cannot say it may be optimized out. If you replace a atomic operation with external function it will be the same: https://godbolt.org/z/GsYjrb6z5

answered Jan 04 '22 at 12:48

Karol T.

543
2
13

1

Atomic operations are not external functions though, they are inlined. If I call an inline function to modify the memory, the allocation is still elided https://godbolt.org/z/xeaPav7o3 – Altan Jan 04 '22 at 12:52
The atomic functions do call compiler builtins such as __atomic_load_n, but compiler builtins are also not external functions. The allocations are still elided if I call __builtin_memcpy for example https://godbolt.org/z/Txda3j5ob – Altan Jan 04 '22 at 12:55

Why do relaxed atomic operations prevent compiler optimizations?

1 Answers1

Linked