In the C/C++ memory model, can a compiler just combine and then remove redundant/NOP atomic modification operations, such as:
x++,
x--;
or even simply
x+=0; // return value is ignored
For an atomic scalar x
?
Does that hold for sequential consistency or just weaker memory orders?
(Note: For weaker memory orders that still do something; for relaxed, there is no real question here. EDIT AGAIN: No actually there is a serious question in that special case. See my own answer. Not even relaxed is cleared for removal.)
EDIT:
The question is not about code generation for a particular access: if I wanted to see two lock add
generated on Intel for the first example, I would have made x
volatile.
The question is whether these C/C++ instructions have any impact what so ever: can the compiler just filter and remove these nul operations (that are not relaxed order operations), as a sort of source to source transformation? (or abstract tree to abstract tree transformation, perhaps in the compiler "front end")
EDIT 2:
Summary of the hypotheses:
- not all operations are relaxed
- nothing is volatile
- atomic objects are really potentially accessible by multiple functions and threads (no automatic atomic whose address isn't shared)
Optional hypothesis:
If you want, you may assume that the address of the atomic so not taken, that all accesses are by name, and that all accesses have a property:
That no access of that variable, anywhere, has a relaxed load/store element: all load operations should have acquire and all stores should have release (so all RMW should be at least acq_rel).
Or, that for those accesses that are relaxed, the access code doesn't read the value for a purpose other than changing it: a relaxed RMW does not conserve the value further (and does not test the value to decide what to do next). In other words, no data or control dependency on the value of the atomic object unless the load has an acquire.
Or that all accesses of the atomic are sequentially consistent.
That is I'm especially curious about these (I believe quite common) use cases.
Note: an access is not considered "completely relaxed" even if it's done with a relaxed memory order, when the code makes sure observers have the same memory visibility, so this is considered valid for (1) and (2):
atomic_thread_fence(std::memory_order_release);
x.store(1,std::memory_order_relaxed);
as the memory visibility is at least as good as with just x.store(1,std::memory_order_release);
This is considered valid for (1) and (2):
int v = x.load(std::memory_order_relaxed);
atomic_thread_fence(std::memory_order_acquire);
for the same reason.
This is stupidly, trivially valid for (2) (i
is just an int
)
i=x.load(std::memory_order_relaxed),i=0; // useless
as no information from a relaxed operation was kept.
This is valid for (2):
(void)x.fetch_add(1, std::memory_order_relaxed);
This is not valid for (2):
if (x.load(std::memory_order_relaxed))
f();
else
g();
as a consequential decision was based on a relaxed load, neither is
i += x.fetch_add(1, std::memory_order_release);
Note: (2) covers one of the most common uses of an atomic, the thread safe reference counter. (CORRECTION: It isn't clear that all thread safe counters technically fit the description as acquire can be done only on 0 post decrement, and then a decision was taken based on counter>0 without an acquire; a decision to not do something but still...)