1

I can't seem to get the address of an atomic object after a store.

e.g.

std::atomic<int> i;
std::atomic<int>* p = &++i; // doesn't work
auto* p = &++i; // doesn't work
// below works:
++i;
auto* p = &i;

What's happening here and why?

To clarify: I know it returns an r-value. Why doesn't it return the original object, this? Is this a purposeful design choice or was it an oversight?

More specifically, what's happening under-the-hood for this requirement?

curiousguy
  • 8,038
  • 2
  • 40
  • 58
j__
  • 632
  • 4
  • 18

1 Answers1

6

While the pre-increment operator usually returns its operand by reference, in the case of std::atomic integers, it returns the new value as a temporary. So in your example ++i does not return a reference to the atomic<int> i itself, it returns the new value of i (i.e. an int). You can see this at: https://en.cppreference.com/w/cpp/atomic/atomic/operator_arith

It would be misleading and even dangerous to return a reference to the original atomic<int>, because to access the int value through this reference would require a second, separate read operation — so its value might be different from the value at the time of increment. (This isn't particularly relevant your example code, since you are only trying to obtain a pointer to the referenced object, but some code will actually access the value after ++ so this is why returning a reference isn't possible.)

In other words, if ++i returned a reference to the atomic<int> i, then

int j = ++i;

would be equivalent to

++i;
// ...other threads may modify the value of `i` here!...
int j = i;

The whole point of atomics is to perform reads and writes together as an indivisible operation, so ++i must internally use hardware/OS atomic operations to simultaneously read and increment the integer, so the new value is returned as a temporary.

If you're curious to see what's under the hood, here is libc++'s implementation where you can see that operator++ simply calls into fetch_add(1) and returns the result + 1.

jtbandes
  • 115,675
  • 35
  • 233
  • 266
  • Seems kind of stupid. Why not just make a lazy-eval type that can do both? This way there's no unnecessary load so that you may do stuff like `++i` without the fetch, and also can still do the fetch immediately without any desync issues. – j__ Aug 22 '20 at 02:53
  • 2
    That might be technically possible, but I don't think it would really lead to safer, more readable code; rather it would make it easier to get things subtly wrong (and complicating the library at that). – jtbandes Aug 22 '20 at 02:56
  • C++ STL has no responsibility to ensure things aren't "subtly wrong". There's plenty of things you can do to misuse the library already, what's one more? Especially since it's possibly 10 lines of code that leads to theoretically *faster* codegen. – j__ Aug 22 '20 at 02:58
  • 2
    @lajoh90686: Because optimizing atomics is a hard problem that language standards + compilers haven't solved yet. [Can and does the compiler optimize out two atomic loads?](https://stackoverflow.com/q/41820539). In this case, what would the advantage be? For it to be usable, you'd need a *guarantee* that you got the same value that `++i` stored, not just the *option* for the compiler to optimize that way, so it makes vastly more sense to return a `T` by value instead of an `atomic` reference. – Peter Cordes Aug 22 '20 at 03:00
  • 1
    At any rate, I think the practice of providing atomic `fetch_add` instructions goes all the way to hardware, which is to say I'm not aware of (and with a quick search can't find any) operations of the form "increment atomically but *don't* return any value". Given that's how it works in hardware there's not really an extra cost to this design. – jtbandes Aug 22 '20 at 03:00
  • @jtbandes: Note that the problem isn't that the subsequent load wouldn't be "synchronized". It's on the same object in the same thread, so the RMW "happens before" the load from anything that would reference the hypothetical return-by-reference object. The problem is that it defeats the atomicity, letting that later load possibly depend on something another thread did, separate from the `++` this thread did. – Peter Cordes Aug 22 '20 at 03:02
  • That's what I meant to convey but I clearly didn't use sufficiently precise terminology; my apologies :) – jtbandes Aug 22 '20 at 03:03
  • IMO it's an important concept that barriers / ordering don't create atomicity. And that atomic RMW with `mo_relaxed` can still be atomic, just not ordered wrt. other modifications. (Defining `++` in terms of `fetch_add` is another reason it needs to return a `T` value; `fetch_add` clearly needs to return the value, not a reference.) – Peter Cordes Aug 22 '20 at 03:07
  • 1
    re: hardware increment without returning the value: x86 can do this: `lock add dword [mem], 1` doesn't put the old or new value in a register, only updates FLAGS based on it. If you want the actual value, you need `lock xadd [mem], eax` to exchange-and-add. Or for operations other than +-, you need a CAS loop if you use the result of `atomic |= 1` or something. There's no asm equivalent for `fetch_or`, only atomic `lock or [mem], reg` or immediate. related: [Can num++ be atomic for 'int num'?](https://stackoverflow.com/q/39393850). – Peter Cordes Aug 22 '20 at 03:10
  • LL/SC machines always need retry loops and always have the old and new values in registers that you can leave around for later, but ISAs like x86 that can make a memory-destination instruction atomic need special instructions to also put the memory value into a register like `fetch_*` – Peter Cordes Aug 22 '20 at 03:11
  • Of course, when you *don't* use the return value, good compilers *do* optimize to `lock or` or whatever. https://godbolt.org/z/MKvzcc. – Peter Cordes Aug 22 '20 at 03:14
  • 3
    *The whole point of atomics is to perform reads and writes together as an indivisible operation* - not quite correct. Another important use-case is pure-load and pure-store operations. Some algorithms don't need any RMWs on one object, but do need loads and stores to it to be atomic (no tearing) and often also ordered. And at a C++ level, free from UB, so rolling your own with `volatile` is almost always a bad idea. (And of course using just plain variables is totally broken in practice by common optimizations like keeping a value in a register, which assumption of no data-race UB allows) – Peter Cordes Aug 22 '20 at 03:57