1

Let std::atomic<std::int64_t> num{0}; be defined somewhere accessible/visible in the code. Is the C++ compiler allowed to replace each of the following two codes with an empty code (something that does nothing)? Similarly, are these optimizations allowed to happen at runtime? I am just trying to get a better understanding of how things work.

num.fetch_add(1,std::memory_order_relaxed);
num.fetch_sub(1,std::memory_order_relaxed);

and

num.fetch_add(1,std::memory_order_relaxed);
std::this_thread::yield();
num.fetch_sub(1,std::memory_order_relaxed);
Alex Guteniev
  • 12,039
  • 2
  • 34
  • 79
Koosha
  • 1,492
  • 7
  • 19
  • 1
    No optimizations are done at runtime. They're all done by the compiler. – Ken White May 05 '20 at 01:46
  • `num.fetch_add(1,std::memory_order_relaxed); num.fetch_sub(1,std::memory_order_relaxed);` This isn't a NOOP as another can access it in between the operations and do something with it. So it shouldn't be replacable by NOOP. Yielding simply tries to let other threads continue with their execution. Otherwise, reading/writing an atomic in an infinite loop until some condition is met - might take too much resources. – ALX23z May 05 '20 at 02:28
  • @ALX23z actually this _is_ replaceable with a NOOP, because it only uses memory_order_relaxed. A similar example is presented by JF Bastien in [N4455 No Sane Compiler Would Optimize Atomics](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4455.html). – mpoeter May 05 '20 at 07:23
  • In theory such optimization is possible. In practice it doesn't seem that compilers do that: https://godbolt.org/z/Gg2Sub – freakish May 05 '20 at 07:25
  • If you had a stronger memory order, it might stop other operations from reordering across the add/sub pair, a bit like an `atomic_thread_fence()`. (But note that fences are stronger than single operations). But with relaxed, the compiler would be allowed to decide that every other thread always sees the ordering where these went back to back and cancelled out. See the bottom of my answer on [Can num++ be atomic for 'int num'?](https://stackoverflow.com/q/39393850) for discussion of that. Note that `volatile atomic` would definitely block this optimization. – Peter Cordes May 05 '20 at 07:37
  • @mpoeter - relaxed ensured atomicity of operations on this variable. If another thread uses this variable shouldn't it be able to see the changes made in another thread by continuously loading the variable? So no - it isn't replacable by NOOP – ALX23z May 05 '20 at 13:40
  • @ALX23z no, there is no guarantee that this "intermediate" value will ever be visible. You would not be able to distinguish whether you do not see it because the optimization has been applied or whether you never read it at "the right time". If the two operations are next to each other (no other operations between them), then this optimization could even be performed with memory_order_seq_cst. Take a look at the link it posted for more details. – mpoeter May 05 '20 at 17:03

1 Answers1

2

I think in theory yes and even yield does not help.

But in practice no not today but possible in the future.

See:

"Runtime optimization" may happen if modifications coalesce. I don't know if this situation may happen in practice or not. Anyway it is not much distinguishable from "no other threads manage to observe modified value before it changes back"

In fact, the optimization effect is equivalent to "no other threads manage to observe modified value before it changes back", no matter if it is compiler optimization, or runtime. Yield is not guaranteed to help, at least because it just "gives opportunity to reschedule", which an implementation might decide to ignore. And in theory there's no synchronization between yield and an atomic operation.

On the other hand what do you expect to achieve with this?

Alex Guteniev
  • 12,039
  • 2
  • 34
  • 79
  • 2
    I would also like to add a reference to [N4455 No Sane Compiler Would Optimize Atomics](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4455.html) as it contains more examples and details about possible optimizations - many of which are not only possible but encouraged. So I would strike the "in practice no". Even if today's compiler do not perform these optimizations, there is no guarantee that this won't change with newer versions. – mpoeter May 05 '20 at 07:41
  • 1
    @mpoeter: I think there's a tacit agreement among compiler devs that until C++ invents a way for programmers to clearly express what they mean, and definitely prevent compilers from sinking relaxed progress-bar stores out of loops for example, compilers should continue to treat atomics kind of like volatile. – Peter Cordes May 05 '20 at 07:44
  • @PeterCordes I understand the concerns discussed in P0062R1, but I think some of the cases presented in N4455 are still perfectly valid. And the conclusion in P0062R1 seems to suggest that we should mark the code that should _not_ be optimized. But that means that code that is currently written could be optimized in the future unless it is adapted, so we should at least be cautions about possible optimizations. – mpoeter May 05 '20 at 07:51
  • 1
    @mpoeter: Yes, N4455 shows some clear examples of cases that would benefit from compilers being able to optimize. I think the mechanism is likely to be some new syntax or type attribute that lets new code opt in to such optimizations, so existing code doesn't risk being broken by it. i.e. it seems that many people agree the ISO C++ standard isn't currently strong enough for what programmers want `atomic` to do in every case, although `volatile atomic` plus sane compilers that don't move stores across very long loops comes close. But yes compilers *might* start optimzing plain `atomic`. – Peter Cordes May 05 '20 at 08:08
  • In MSVC for similar sort of issue to control if properly sized/aligned `volatile` is an acquire/release atomic or not `/volatile:ms` vs `/volatile:iso` switch is used. Compilers might do this via switch or `#pragma` instead of having yet another attribute. – Alex Guteniev May 05 '20 at 08:16