14

I was looking at list 5.13 in C++ Concurrency in Action by Antony Williams:, and I am confused by the comments "the store to and load from y still have to be atomic; otherwise, there would be a data race on y". That implies that if y is a normal (non-atomic) bool then the assert may fire, but why?

#include <atomic>
#include <thread>
#include <assert.h>
bool x=false; 
std::atomic<bool> y;
std::atomic<int> z;

void write_x_then_y()
{
 x=true; 
 std::atomic_thread_fence(std::memory_order_release);
 y.store(true,std::memory_order_relaxed); 
}

void read_y_then_x()
{
 while(!y.load(std::memory_order_relaxed)); 
 std::atomic_thread_fence(std::memory_order_acquire);
 if(x) ++z;
}

int main()
{
 x=false;
 y=false;
 z=0;
 std::thread a(write_x_then_y);
 std::thread b(read_y_then_x);
 a.join();
 b.join();
 assert(z.load()!=0); 
}

Now let's change y to a normal bool, and I want to understand why the assert can fire.

#include <atomic>
#include <thread>
#include <assert.h>
bool x=false; 
bool y=false;
std::atomic<int> z;

void write_x_then_y()
{
 x=true; 
 std::atomic_thread_fence(std::memory_order_release);
 y=true;
}

void read_y_then_x()
{
 while(!y); 
 std::atomic_thread_fence(std::memory_order_acquire);
 if(x) ++z;
}

int main()
{
 x=false;
 y=false;
 z=0;
 std::thread a(write_x_then_y);
 std::thread b(read_y_then_x);
 a.join();
 b.join();
 assert(z.load()!=0); 
}

I understand that a data race happens on non-atomic global variables, but in this example if the while loop in read_y_then_x exits, my understanding is that y must either already be set to true, or in the process of being set to true (because it is a non-atomic operation) in the write_x_then_y thread. Since atomic_thread_fence in the write_x_then_y thread makes sure no code written above that can be reordered after, I think the x=true operation must have been finished. In addition, the std::memory_order_release and std::memory_order_acquire tags in two threads make sure that the updated value of x has been synchronized-with the read_y_then_x thread when reading x, so I feel the assert still holds... What am I missing?

manu652
  • 141
  • 6
  • 12
    One thing to consider: optimizing compilers may optimize away the loop, assuming that since `y` is not changed in the body. Atomic operations make it clear to the compiler that the change will happen in another thread, and that the result may be different. – Colonel Thirty Two Jun 16 '22 at 01:54
  • 1
    In fact, compilers like Clang and GCC actually do such optimizations: see https://godbolt.org/z/ocrxnee8T (the while loop is optimized out). – Jérôme Richard Jun 16 '22 at 10:39
  • In C++ ["data race"](http://eel.is/c++draft/intro.races#21) is a specific kind of UB that happens when the same variable is accessed from different threads without synchronization, and not (only) a remark about the program having unpredictable behavior depending on how threads are executed. – HolyBlackCat Jun 16 '22 at 18:02

2 Answers2

12

Accessing a non-atomic object in two threads unsynchronized with one of the accesses being a write access is always a data race and causes undefined behavior. This is how the term "data race" is formally defined in the C++ language and what it prescribes as its consequences. It is not merely a race condition which informally refers to multiple possible outcomes being allowed due to unspecified ordering of certain thread accesses.

The write in y=true; happens while the loop while(!y); is still reading y, which makes it a data race if y is non-atomic. The program would have undefined behavior, which doesn't just mean that the assert might fire. It means that the program may do anything, e.g. crash or freeze up.

The compiler is allowed to optimize under the assumption that this never happens and thereby optimize the code in such a way that your intended behavior is not preserved since it relies on the access causing the data race.

Furthermore, an infinite loop which doesn't eventually perform any atomic/synchronizing/volatile/IO operation also has undefined behavior. So while(!y); has undefined behavior if y is not an atomic and initially false and the compiler can assume that this line is unreachable under those conditions.

The compiler could for example remove the loop from the function for that reason, as actually does happen with current compilers, see comments under the question.

And I am also aware that especially Clang does perform optimization based on that and sometimes even goes so far as to completely drop all contents (including the ret instruction at the end!) from an emitted function with such an infinite loop, if it could not ever be called without undefined behavior. However here, because y might be true when the function is called, in which case there is no undefined behavior for that, this doesn't happen.

All of this is on the language level. It doesn't matter what would happen on the hardware level if the program was compiled in a most literal translation. These would be additional concerns, e.g. potential tearing of write access and potential cache incoherency between threads, but both of these are unlikely to be a problem on common platforms for a bool. Another problem might be though that the threads could keep a copy of the variable in a register, potentially never producing a store that the other thread could observe, which is allowed for a non-atomic non-volatile object.

user17732522
  • 53,019
  • 2
  • 56
  • 105
  • It's very unlikely that a program will crash only because of a data race, it just won't see the updated value consistently or at all. It's also somewhat misleading to say an infinite loop has undefined behavior. It may be good to explicitly say undefined behavior (in this context) means that you won't know if it will be optimized out or not. This answer provides some helpful additional nuance: https://stackoverflow.com/a/28681034/1562007 – Josh Larson Jul 03 '22 at 18:35
  • @JoshLarson An infinite loop that violates the forward-progress requirement has undefined behavior. It doesn't only mean that the loop may be removed. And when I say that the program may crash, I do mean that. For example, [here](https://godbolt.org/z/v7svjK8bc) is a small variation for which Clang's compiled program produces complete nonsense output and then segfaults. – user17732522 Jul 03 '22 at 19:04
  • @JoshLarson Sorry, the link doesn't actually produce the output I got initially. I got a segfault with stderr output `terminate called after throwing an instance of 'terminate called recursively terminate called recursively terminate called recursively terminate called recursively`. It may be timing-dependent. – user17732522 Jul 03 '22 at 19:16
  • I replicated your described behavior with that clang compiler with optimization enabled, although a compiler generating garbage (omitting entire functions, breaking the function pointers) isn't really a fair optimization at all, it seems more like a bug in clang – Josh Larson Jul 03 '22 at 19:20
  • @JoshLarson As far as I know it is very much intentional. It is deadcode elimination taken as far as possible. Executing the loop statement would violate forward-progress guarantees. Because forward-progress is imposed as requirement on all programs, the loop statement can therefore never be reached. Consequently it must be dead code. But if it is dead code, then the whole function must also be dead code as there is no way to execute it without executing the loop. Therefore no instructions need to be emitted for the function at all. Clang strips the `ret` as well, causing the weird behavior. – user17732522 Jul 03 '22 at 19:27
  • gcc and msvc both produce the infinite loops, and it looks like clang is technically calling main() again because of that label, and that's what's making it crash. I can't imagine that's intentional. Here's an example of it successfully returning 0 (by removing the assertion): https://godbolt.org/z/K86aM9v5o) – Josh Larson Jul 03 '22 at 19:50
  • @JoshLarson I couldn't find a good match, but consider for example [this bug report](https://github.com/llvm/llvm-project/issues/54284) where the behavior completely changes due to such a loop or [this one](https://github.com/llvm/llvm-project/issues/44466) where a segmentation fault results and it is considered a bug only because C has slightly different rules than C++ (specifically loops with controlling expressions which are constant expressions are excluded from forward-progress). Also zygoloid explaining this in the ticket is Richard Smith who was/is the editor for the C++ standard. – user17732522 Jul 03 '22 at 21:11
5

If you write this:

bool y=false;
...
while(!y); 

then the compiler can assume y will not change by itself. The body of the while is empty so either y is true at the start and you have an endless loop or y is false at the start and the while ends.

The compiler can optimize this into:

if (!y) while(true);

But c++ also says that there must always be progress, an infinite loop is UB, so the compiler may do whatever it likes when it sees a while(true);, including removing it. gcc and clang will actually do that as Jerome pointed out here: https://godbolt.org/z/ocrxnee8T

So what the std::atomic<bool> y; does is the modern form of marking y as volatile. The compiler can no longer assume that repeated reads of y give the same result and can no longer optiomize away the while(!y); loop.

Depending on the architecture it will also insert necessary memory barriers so changes to the variable become observable to other threads, which is more than volatile would have done.

Goswin von Brederlow
  • 11,875
  • 2
  • 24
  • 42
  • Atomic and `volatile` are different things. Modern code might still need `volatile` for things unrelated to multithreading. And properly written old code (though I'm less sure about this) wouldn't use them in place of atomics. – HolyBlackCat Jun 16 '22 at 19:34
  • Modern code needs volatile for hardware MMIO. Everywhere else I believe you are better served with atomic. And no, even in old code volatile isn't a replacement for atomic. But atomic replaces most obsolete uses of volatile. – Goswin von Brederlow Jun 16 '22 at 19:39
  • I agree. My point is that *"`std::atomic y;` ... is the modern form of marking y as volatile"* implies that modern code shouldn't use volatile, and that using it for synchronization is just "not modern", and not invalid. – HolyBlackCat Jun 16 '22 at 19:42
  • @HolyBlackCat Yes to not using volatile, yes to using volatile for synchronization is bad. As for it ever being valid: Volatile on it's own was never sufficient for synchronization, just required. – Goswin von Brederlow Jun 16 '22 at 19:48