1

I am reading the book of CPP-Concurrency-In-Action-2ed-2019. In chapter 5.3, the author gives a simple example:

#include <vector>
#include <atomic>
#include <iostream>

std::vector<int> data;
std::atomic<bool> data_ready(false);

void reader_thread()
{
  while(!data_ready.load())  // 1
  {
    std::this_thread::sleep(std::milliseconds(1));
  }
  std::cout<<"The answer="<<data[0]<<"\m";  // 2
}
void writer_thread()
{
  data.push_back(42);  // 3
  data_ready=true;  // 4
}

Althought the author only wants to explain how to use atomic_bool to control code sequences(i.e syncronization) between threads.But what confuses me is that if I use plain bool instead of atomic_bool, isn't it still thread-safe?

#include <vector>
#include <atomic>
#include <iostream>

std::vector<int> data;
bool data_ready(false);

void reader_thread()
{
  while(!data_ready)  // 1
  {
    std::this_thread::sleep(std::milliseconds(1));
  }
  std::cout<<"The answer="<<data[0]<<"\m";  // 2
}
void writer_thread()
{
  data.push_back(42);  // 3
  data_ready=true;  // 4
}

Such as +=, -=, *=, |= are not thread-safe, I know that, but only write/read on a POD variable isn't originally thread-safe itself, such as data_ready=true?

I also search for other answers like Difference between atomic and int, but to be honest, I don't get their point~~

Jan Schultke
  • 17,446
  • 6
  • 47
  • 96
f1msch
  • 509
  • 2
  • 12
  • Attempt to read from normal bool variable while another thread attempts to write into it is a race condition. – user7860670 Jul 14 '23 at 09:25
  • Assume `bool` is atomic for the sake of argument. This code is still troublesome because compiler cannot tell in the normal flow of execution that the variable is accessed from multiple threads and therefore can optimize it away. It would need to be volatile at least. But even in such case it would not be portable, because atomicity of given types is highly platform-dependent. – alagner Jul 14 '23 at 09:25
  • `but only write/read on a POD variable isn't originally thread-safe itself` why would it be? I know embedded programming is somewhat different to PCs, but imagine 8bit architecture storing 16bit number register by register in two consecutive operations. Then imagine an interrupt coming the moment the first byte is stored, the other not yet. This is a classic example from AVR's programming manual ;) – alagner Jul 14 '23 at 09:29
  • Minor point: `while(!data_ready.load())` can be written more simply as `while(!data_ready)`. The load happens automatically. – Pete Becker Jul 14 '23 at 11:35

1 Answers1

3

You cannot use a plain bool for inter-thread communication, because it can lead to a data race. A data race in C++ occurs when two threads access the same memory simultaneously, and at least one of these accesses is a write. Formally:

The execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is not atomic, and neither happens before the other, except for the special case for signal handlers described below. Any such data race results in undefined behavior.

Two expression evaluations conflict if one of them modifies a memory location and the other one reads or modifies the same memory location.

- [intro.races]/21, [intro.races]/2

Consider what can happen:

// reader thread reads
while(!data_ready)

// writer thread writes
data_ready = true;

There is no synchronization that prevents these accesses from happening concurrently, making it a data race.

Even without a data race, compilers may optimize non-atomic objects much more aggressively. while(!data_ready) {} is either:

  • a no-op, if data_ready == true (OK), or
  • an infinite loop (UB), if data_ready == false, and we assume that data_ready isn't updated by another thread.

Technically, using std::atomic in this way is also not okay, because the compiler isn't forbidden from assuming that data_ready never changes, even if it is atomic. This code relies on the mantra "no sane compiler would optimize atomics".

Behavior for various architectures in practice

x86_64

You might not experience any symptoms of this UB because on x86_64, all aligned memory access up to 8 bytes is atomic anyway. However, you shouldn't rely on the architecture making this "okay in practice", and always use atomics.

As stated above, the code is also more likely to get stuck in an infinite loop due to compiler optimizations, regardless of whether there is a data race.

Others

On other architectures, it might be possible that !data_ready will read a nonsense value if another thread is writing to data_ready at the same time. This may be unlikely for something small like bool, but very likely for larger types like char[1024], which are also POD/trivially copyable, but cannot be written to memory in one operation.

Jan Schultke
  • 17,446
  • 6
  • 47
  • 96
  • Related topic: provided there is any access whatsoever in the first place, couldn't the compiler just optimize both read and write away? I know atomic!=volatile, but they both share some overlapping traits in the manner of unsequenced access. – alagner Jul 14 '23 at 09:32
  • @alagner only to some extent. The repeated read is extremely dangerous because the compiler can assume that `data_ready` always stays the same, unless we write to it and force some synchronization. This would result in an infinite loop. However, the write cannot be fully optimized away, because it violates: https://eel.is/c++draft/atomics.order#11. Either case works in practice "because no sane compiler would optimize atomics". (until it does :) ) – Jan Schultke Jul 14 '23 at 09:36
  • 1
    The processor architecture alone can't make it safe. Compilers might optimize non-atomic variables on the assumption that other threads don't modify them (without synchronization). – HolyBlackCat Jul 14 '23 at 09:44
  • @HolyBlackCat that's true; I've elaborated on how this code would be broken without atomics, even if no data race took place. To be fair, the code is broken even with atomics and should probably use `volatile std::atomic` or force some synchronization. – Jan Schultke Jul 14 '23 at 09:52
  • 1
    What is `volatile` for? OP's code looks fine to me (except the sleep loop should probably be `.wait()`). The synchronization is provided by the atomic variable itself, as long as you don't use `memory_order_relaxed` (OP could switch to `memory_order_{acquire,release}` from an unnecessarily strong default `memory_order_seq_cst`). – HolyBlackCat Jul 14 '23 at 10:01
  • @HolyBlackCat as long as you only read from atomics, no synchronization takes place. Even with `memory_order_seq_cst`, the load from `data_ready` can yield an arbitrarily outdated value. If the compiler aggressively optimizes based on this, then it will only read from `data_ready` once and enter an infinite loop if it's `false`. The problem would be the same regardless of whether you use `memory_order_relaxed` or `memory_order_seq_cst`. `volatile` forces the compiler to treat the load as a side effect, so it actually has to take place, which prevents the infinite loop. – Jan Schultke Jul 14 '23 at 10:13
  • Compilers don't remove repeated atomic reads. While it's an interesting question whether they're formally allowed to, I highly doubt they will. The consensus seems to be that volatile isn't needed in multithreading: https://stackoverflow.com/q/4557979/2752075 – HolyBlackCat Jul 14 '23 at 11:23
  • @HolyBlackCat compilers don't aggressively optimize atomics (such as coalescing repeated reads), but that's not because they aren't allowed to, but rather because it's very difficult to implement and it often violates user intent. See [N4455 No Sane Compiler Would Optimize Atomics](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4455.html). Theoretically, `volatile std::atomic` is required in this instance, but all implementations treat atomics as quasi-volatile. – Jan Schultke Jul 14 '23 at 12:13
  • 1
    As I see it, p11 would forbid the compiler from transforming `while (!data_ready) { }` into `if (!data_ready) while(1) { }`, just as it forbids deleting the store. In order to make atomic stores visible to atomic loads, you have to ensure that the loads as well as the stores both actually happen. It would be fine to coalesce *finitely* many repeated reads together, but not *infinitely* many. – Nate Eldredge Jul 14 '23 at 14:51
  • Also, if we are parsing the standard that strictly, it's hard to conclude that `volatile` would be guaranteed to help; all it adds is the "evaluated strictly according to the rules of the abstract machine" which is even more vague and ill-defined than anything written about atomics. – Nate Eldredge Jul 14 '23 at 14:54
  • @NateEldredge I don't think that you can read it this way. The standard just says that "atomic loads should be made visible", which only implies that the stores shouldn't be buffered indefinitely, but made visible to other threads quickly. The standard doesn't require "atomic loads to make an effort to read a recent value". I believe it would perfectly legal to coalesce an infinite amount of memory accesses when there is no synchronization, like in this case. – Jan Schultke Jul 14 '23 at 15:11
  • Upon repeated reading, I can see how you can interpret it as "atomic loads should be using recent atomic stores" or so. I agree that there are some wording issues, both for `volatile` and for atomic. This is deep in *"no sane implementation would"* territory. – Jan Schultke Jul 14 '23 at 15:19
  • We might need to agree to disagree. But what do you have in mind when you say "force synchronization"? What would that look like? I have a hard time seeing how any of the synchronization / happens-before rules could get you out of this pickle. – Nate Eldredge Jul 14 '23 at 15:20
  • @NateEldredge perhaps you could force it with `std::atomic_thread_fence`, or you could use a RMW operation, which is always required to read the most recent value. There are few things we can use to obtain the most recent value tbf, so the best solution is probably not to play this silly game and use a threading primitive that was intended for waiting on a condition, like `std::condition_variable`, instead of busy-waiting. – Jan Schultke Jul 14 '23 at 15:40
  • The reason this might happen to compile to safe x86-64 asm is that loads in x86-64 asm have acquire semantics, so it's safe to read `data[0]` after an earlier loads sees `data_ready == true`. In this case, `thread::sleep` inside the loop probably stops compilers from turning non-atomic into `if (!data_ready) { while(42){...} }`, since `sleep` is probably a black box for the optimizer; a non-inline function call that must be assumed to read/write any global. If `data_ready` was `static`, it'd it's not globally reachable, but it would still have to assume `sleep` could call `writer_thread()`. – Peter Cordes Jul 14 '23 at 16:26