Yes, C++ defines data race UB as potentially-concurrent access to non-atomic
objects when not all the accesses are reads. Another recent Q&A quotes the standard, including.
[intro.races]/2 - Two expression evaluations conflict if one of them modifies a memory location ... and the other one reads or modifies the same memory location.
[intro.races]/21 ... The execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is not atomic, and neither happens before the other, ...
Any such data race results in undefined behavior.
That gives the compiler freedom to optimize code in ways that preserve the behaviour of the thread executing a function, but not what other threads (or a debugger) might see if they go looking at things they're not supposed to. (i.e. data race UB means that the order of reading/writing non-atomic variables is not part of the observable behaviour an optimizer has to preserve.)
introducing reads/writes may change other thread's behavior
The as-if rule allows you to invent reads, but no you can't invent writes to objects this thread didn't already write. That's why if(a[i] > 10) a[i] = 10;
is different from a[i] = a[i]>10 ? 10 : a[i]
.
It's legal for two different threads to write a[1]
and a[2]
at the same time, and one thread loading a[0..3]
and then storing back some modified and some unmodified elements could step on the store by the thread that wrote a[2]
.
Crash with icc: can the compiler invent writes where none existed in the abstract machine? is a detailed look at a compiler bug where ICC did that when auto-vectorizing with SIMD blends. Including links to Herb Sutter's atomic weapons talk where he discusses the fact that compilers must not invent writes.
By contrast, AVX-512 masking and AVX vmaskmovps
etc, like ARM SVE and RISC-V vector extensions I think, do have proper masking with fault suppression to actually not store at all to some SIMD elements, without branching.
It's legal to invent atomic RMWs (except without the Modify part), e.g. an 8-byte lock cmpxchg [rcx], rdx
if you want to modify some of the bytes in that region. But in practice that's more costly than just storing modified bytes individually so compilers don't do that.
Of course a function that does unconditionally write a[2]
can write it multiple times, and with different temporary values before eventually updating it to the final value. (Probably only a Deathstation 9000 would invent different-valued temporary contents, like turning a[2] = 3
into a[2] = 2; a[2]++;
)
For more about what compilers can legally do, see Who's afraid of a big bad optimizing compiler? on LWN. The context for that article is Linux kernel development, where they rely on GCC to go beyond the ISO C standard and actually behave in sane ways that make it possible to roll their own atomics with volatile int*
and inline asm. It explains many of the practical dangers of reading or writing a non-atomic
shared variable.