Is there a data race between writes that nominally happen under mutually exclusive conditions?

Question

Consider the following code snippet:

// Main thread
int non_atom = 0;
bool x = _ // some value not known statically, depends on run-time arguments and complex calculations.

// Thread 1 and 2 are started

// Thread 1
if (!x) {
    non_atom = 5; // only write to non_atom if not x
}

// Thread 2
int bar = 0;
if (x) {
    bar = non_atom; // only access non_atom if x
}

At runtime only either T1 or T2 will access non-atom, so there should be no data race. But let's say the compiler decides to invert the if-condition in T2 like this:

int bar = non-atom;
if (!x) {
    bar = 0;
}

This isn't simply a reordering but rather an inversion of the conditional and its effects. There is now a data-race, because if x==true both threads access non-atom without synchronization.

Is the compiler allowed to do this? Let's assume that the compiler cannot statically prove that only one thread accesses x, for example because the code in T1 and T2 is in functions in separate translation units. Is the compiler allowed to introduce a data-race where dynamically, there wasn't one before, or rather, are compiler careful enough not to do such a transformation even if no atomics are involved anywhere? If I encounter such a situation in the real world, should I use an atomic instead just to be safe?

This question is similar to this older question: Data race guarded by if (false)... what does the standard say?

However, in this old question the conditional was statically always false, so it was clear that it should never be evaluated. In this question, the runtime value is not known to the compiler at compile-time, and the optimization is not a simple reordering.

This question was inspired by a post by Russ Cox (ctrl-f "Note that all these optimizations") on the Go Memory Model, in which he claims that such transformations ARE allowed by C++ compilers, in contrast to Go's.

No, the compiler can't break correct code under the as-if rule. Even if it introduces a load, it must do so in a way that doesn't break code. — HolyBlackCat, Aug 20 '23 at 11:13
In this case, it's assumed that the compiler cannot know at compile-time that the program is correct for all inputs, since there is no synchronization. When compiling the function inside T2, it might assume that any other concurrent access to non-atom would be UB anyway, since it's not an atomic, so it must only concern itself with correctness inside this function. — JMC, Aug 20 '23 at 11:22
As long as *you* can somehow guarantee the initialisation of `x` happens before either thread accesses `x` (e.g. `x` is initialised before either of the two threads are created) there is no data race. If the compiler somehow breaks that, then the compiler is (seriously) flawed. If you can't offer that guarantee (e.g. because the initialisation may happen after either thread is launched) then behaviour is undefined. — Peter, Aug 20 '23 at 11:24
The modified T2 example is not equivalent to the unmodified one. — user3840170, Aug 20 '23 at 11:25
All optimizations (except a few select ones, like NRVO) are limited by the as-if rule. The compiler can't break correct code. If it does, it's a bug. — HolyBlackCat, Aug 20 '23 at 11:28
@Peter Thanks. If we had, instead, written the code so T1 *always* writes without a conditional. Could we then also say "the code is guaranteed data race free, as long as I can guarantee that the program only started with the correct input so that x is false at runtime?" So is it valid to have well-defined code for some inputs even though it's UB for some other input? — JMC, Aug 20 '23 at 11:30
Yes, that would be valid. It's the same thing as asking if `int x, y; cin >> x >> y; cout << (x/y);` is legal or not due to possible divide-by-zero, if the user never inputs `0`. — HolyBlackCat, Aug 20 '23 at 11:35
@JMC - No. If Thread 1 always assigns to `non_atom` then you have a race condition, since modification of `non_atom` and accessing the value of `non_atom` in different threads are unsequenced. That would be true regardless of program input. — Peter, Aug 20 '23 at 12:30
@Peter The value of x depends on the program input and T2's access still depends on x in that hypothetical , so in the case that the input is such that x will be false, only T1 would access non_atom (unless the compiler does the transformation), because T2's conditional-branch should not be executed if no optimization is performed. So it should still be defined for inputs that let x be false, or not? — JMC, Aug 20 '23 at 12:35
@JMC - As per my first comment, it will be OK if you can *guarantee* that `x` will be initialised (i.e. program input is read, and the variable initialised accordingly) before either of your threads access its value. One way to do that is for the main thread to set the value of `x` before it launches the two threads. But if the main thread launches either of the other two threads, and then obtains inputs and initialises `x`, you don't have correct sequencing and behaviour is undefined. — Peter, Aug 20 '23 at 13:16
@Peter of course, I was always operating under the assumption that `x` would be initialized based on input before the threads, and never changed, even in the other scenario in which T1 always writes to `non-atom`. I should have made that clearer in the question. — JMC, Aug 20 '23 at 14:12

user17732522 · Accepted Answer · 2023-08-20T18:32:42.970

2

The compiler is not allowed to transform the code that way on the source level, in the sense that it may not consider the programs equivalent. They are not equivalent because one has undefined behavior while the other doesn't.

However, the compiler can of course make use of knowledge about the variables and the architecture to implement the function in non-straightforward ways in machine instructions.

For example, the compiler knows that non_atom has static storage duration and therefore the memory reserved for non_atom is still accessible without possibly causing a fault, regardless of the value of x.

Then, the only problem with performing an extra load of non_atom would generally be that it may cause a teared value to be read or that it may not see the value written by the other thread. There is nothing else that could happen, typically. But neither of these can affect the result if the original program didn't execute a path with UB (data race). So doing the load independent of the value of x is not a problem.

So the compiler can do the load unconditionally in the machine code. It doesn't affect observable behavior of the program on any path that has a defined observable behavior. Whether they actually do that is a different question.

One exception to the above that I am aware of are floating-point values on x86 with floating point exceptions enabled and using the legacy x87 instructions (see comments below this answer). In that case loading the value unconditionally may cause a floating point exception, e.g. if the value read is a signaling NaN if x is false, which would result in the observable behavior not being correct.

Actually, this is an open problem at least on GCC (but I would guess also other compilers) since it doesn't take this possibility into account when transforming to such a speculative load and can therefore cause a floating point exception through the speculative load even though the program doesn't execute any path with a data race or other UB on the C++ level. See this bug report. That's technically not standard-conforming (if the configuration with exceptions on signaling NaNs is intended to be so). But as the comments say it is hard to correctly take care of during optimization.

Also, you need to make sure of course that any modification of x is synchronized with the access in either of the two shown threads, e.g. by having x be written before these threads are started. Otherwise your program does have undefined behavior due to a data race, but on x, not on non_atomic.

edited Aug 20 '23 at 18:32

answered Aug 20 '23 at 11:36

user17732522

53,019
2
56
105

So in the case that the compiler doesn't have this additional knowledge (E.g. T1 and T2 only have pointers to `x` and `non-atomic`, the functions are in differents TU's etc.) it absolutely cannot consider them equal? Does that mean that one could generally say that compilers take into account the possibility for variables to be access from another thread even when they are not atomic? – JMC Aug 20 '23 at 11:41
1

@JMC If the compiler only has a pointer, it can't do this because the pointer value might not point to memory that is only accessible if `x` is true. Causing e.g. a page fault would cause the program to not behave as required. – user17732522 Aug 20 '23 at 11:42
@JMC "_Does that mean that one could generally say that compilers take into account the possibility for variables to be access from another thread even when they are not atomic?_": I don't know whether they do. I haven't checked. – user17732522 Aug 20 '23 at 11:44
@JMC Actually I just remembered one common exception, see my edit. – user17732522 Aug 20 '23 at 11:48
thanks. I believe that the first 2 sentences of your answer would imply that yes, compilers must take that into account. Because as you've said, one is UB, the other is not, and the UB comes from the data race, meaning that the compiler must consider the possibility of other accesses in order to determine whether the transformed version is UB or whether it is equivalent, and it must do so even if if doesn't know what T1 does (because T1's code is in another TU). Or am I mistaken in this conclusion? – JMC Aug 20 '23 at 12:15
@JMC Yes, the compiler must take it into account, but knowing the storage duration of the object that is going to be read and knowing about the architecture will generally be enough, regardless of what other code does. If the compiler introduces the speculative load, then either the value read that way isn't going to be used and the behavior of the program doesn't differ from the original's specified observable behavior or the original program follows an execution that has undefined behavior if the value is going to be used. In that case the compiler doesn't have to care about the result. – user17732522 Aug 20 '23 at 12:22
1

@JMC It is only a problem if the unnecessary speculative load will have other side effects on the architecture, e.g. a floating point exception as I mentioned, or a page fault if memory isn't accessible. If the compiler statically knows that this can't happen, then there is no problem with the speculative load. – user17732522 Aug 20 '23 at 12:23
I see, thanks. If T1 was modified to always execute its write-access to non-atom, but T2 was left as it is, this reasoning would still hold, would it not? As long as x is false (depending on program input), there would be no UB, so it can do the transformation because the teared-value would be overwritten in any non-UB-path. So even in this case (T1 writes unconditionally), as long as I know that x is always false at runtime, the program is valid? – JMC Aug 20 '23 at 12:47
2

@JMC Yes, and more generally, either the original program has a data race and UB for a given set of inputs or it doesn't. Whatever the compiler does, it must make sure that the behavior for inputs without UB is as specified. It can do whatever for paths with inputs that have UB. How the compiler achieves this is an implementation detail that the programmer doesn't need to care about. It can't affect the observable behavior as long as the programmer makes sure that the program doesn't have UB / a data race on the abstract machine. (Bugs and deviations from the standard aside.) – user17732522 Aug 20 '23 at 12:57
@JMC Also, just to make sure, I added the caveat that you were discussing in the question comments to my answer. – user17732522 Aug 20 '23 at 13:44
1

Interesting point about legacy x87 `fld` possibly raising an FP exception on SNaN or Denormal. https://www.felixcloutier.com/x86/fld#floating-point-exceptions. Since x87 loads of float or double convert to the internal 80-bit x87 format, unlike `long double` loads which just copy the bit-pattern unchanged and don't raise FP exceptions. None of this applies to modern x86 using SSE2 for scalar math; `movsd xmm, mem` doesn't examine the bit-pattern and can't raise a SIMD-FP math exception. – Peter Cordes Aug 20 '23 at 15:52
A hypothetical architecture with hardware race-detection would also disallow this transformation. (Presumably such an ISA would have special instructions for loads / stores on shared memory which `std::atomic<>` and locking could use.) But compilers know that the target CPUs they compile for doesn't have race detection so "data race UB" is only a C++ concept, not reflecting any assembly language behaviour (except that x87 case). See also https://lwn.net/Articles/793253/ re: compilers inventing loads (and how that's a problem for lock-free code that fails to use `_Atomic` or equivalent.) – Peter Cordes Aug 20 '23 at 15:58
1

@PeterCordes Thank you for providing details on the x87 issue. I also found the GCC bug report (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93934). I wonder whether other compilers handle this situation correctly or if it is even possible to do so without having to significantly reduce optimizations for such loads. – user17732522 Aug 20 '23 at 18:34

Is there a data race between writes that nominally happen under mutually exclusive conditions?

1 Answers1