41

I need to set a flag for another thread to exit. That other thread checks the exit flag from time to time. Do I have to use atomic for the flag or just a plain bool is enough and why (with an example of what exactly may go wrong if I use plain bool)?

#include <future>
bool exit = false;
void thread_fn()
{
    while(!exit)
    {
        //do stuff
        if(exit) break;
        //do stuff
    }
}
int main()
{
    auto f = std::async(std::launch::async, thread_fn);
    //do stuff
    exit = true;
    f.get();
}
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
PowerGamer
  • 2,106
  • 2
  • 17
  • 33
  • Near duplicate of [MCU programming - C++ O2 optimization breaks while loop](https://electronics.stackexchange.com/a/387478) / [Multithreading program stuck in optimized mode but runs normally in -O0](https://stackoverflow.com/q/58516052) – Peter Cordes Jan 11 '23 at 09:51

3 Answers3

35

Do I have to use atomic for “exit” bool variable?

Yes.

Either use atomic<bool>, or use manual synchronization through (for instance) an std::mutex. Your program currently contains a data race, with one thread potentially reading a variable while another thread is writing it. This is Undefined Behavior.

Per Paragraph 1.10/21 of the C++11 Standard:

The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior.

The definition of "conflicting" is given in Paragraph 1.10/4:

Two expression evaluations conflict if one of them modifies a memory location (1.7) and the other one accesses or modifies the same memory location.

Andy Prowl
  • 124,023
  • 23
  • 387
  • 451
  • 1
    I can hardly see how accessing a `bool` can be anything but atomic in practice (though I agree in formal standardese it's a whole other matter). Anyway, aside from the race condition itself, I believe one also needs a memory barrier (which is provided by `atomic<>` or `std::mutex`) to ensure the compiler doesn't cache the data or reorder the instructions. However I lack the theoretical knowledge to explain that correctly (even though I know how to use it in practice), care to enlighten me if you can? Or should I create a new question? – syam Apr 19 '13 at 19:12
  • 22
    To elaborate (narrowly), the data-race condition in the standard isn't just about what one thread can do that is visible to another thread. It's also about optimizations or code reordering. A compiler can assume no data races, so it can assume that if one thread reads a variable but doesn't modify it, the value can't change between synchronizing points. Your thread may never exit. Or if a thread sets a flag without using it, the modification may be re-ordered to come before any other code (after the last synchronizing point). I'm even ignoring cache coherency. This is why atomics exist. – Adam H. Peterson Apr 19 '13 at 19:13
  • @Andy Quotes precisely to the point - enough for me to go use atomic. Hopefully Pete in his answer will clarify description of actual problems in my example code that can arise from using plain bool. – PowerGamer Apr 19 '13 at 19:43
  • 2
    @PowerGamer: This "I'm doing what I want, against the normal advice, unless proven wrong" approach is a habit you should probably try to break... – GManNickG Apr 19 '13 at 20:46
  • 1
    Does replacing `bool` with `std::atomic_bool` here fix the code and make it portable and reliable? – Violet Giraffe Apr 01 '15 at 10:36
  • 1
    @VioletGiraffe: Yes, it does. – Andy Prowl Apr 01 '15 at 10:50
  • "volatile bool" will have the same effect without adding . volatile means that the variable may be changed. Thus, unless the compiler has a bug, will not be cached. – rxantos Mar 29 '22 at 17:03
  • @rxantos, unfortunatley no. "[`volatile`] *only guarantees that instructions are not omitted and the instruction ordering is preserved.* `volatile` *does not guarantee a memory barrier to enforce cache coherence.*" - https://stackoverflow.com/questions/29633222/ – Wololo Aug 09 '23 at 10:03
15

Yes, you must have some synchronization. The easiest way is, as you say, with atomic<bool>.

Formally, as @AndyProwl says, the language definition says that not using an atomic here gives undefined behavior. There are good reasons for that.

First, a read or write of a variable can be interrupted halfway through by a thread switch; the other thread may see a partly-written value, or if it modifies the value, the original thread will see a mixed value. Second, when two threads run on different cores, they have separate caches; writing a value stores it in the cache, but doesn't update other caches, so a thread might not see a value written by another thread. Third, the compiler can reorganize code based on what it sees; in the example code, if nothing inside the loop changes the value of exit, the compiler doesn't have any reason to suspect that the value will change; it can turn the loop into while(1).

Atomics address all three of these problems.

Pete Becker
  • 74,985
  • 8
  • 76
  • 165
  • 1
    Can you please elaborate on your answer a bit more: How exactly 1st issue can be a problem for bool variable with only two values: 0 or 1 (there are no "half" values)? 2nd issue: are you saying that the thread writing into "exit" var can write into cache and the value from cache will never go into the memory where "exit" var is located? 3rd issue: you mean that compiler can (theoretically) be so smart as to see that a) before call to thread_fn() "exit" is always true and b) nothing that thread_fn() calls changes "exit" - that's what gives compiler the right to change the loop, correct? – PowerGamer Apr 19 '13 at 19:31
  • 1
    @PowerGamer - yes, tearing of a bool variable is unlikely. I won't go through perverse scenarios where it hypothetically could happen. For other types it can and will happen if you don't take steps to prevent it. – Pete Becker Apr 19 '13 at 19:38
  • Hi, @PeteBecker. I am confused a little bit about the following reason: "Second, when two threads run on different cores, they have separate caches; writing a value stores it in the cache, but doesn't update other caches, so a thread might not see a value written by another thread.". Doesn't cache coherency fix this issue? – eaniconer Oct 18 '19 at 06:40
  • @eaniconer Cache coherency is a processor emplementation detail/term. The fact it exists doesn't mean that all the memory is visible consistently by all threads at any given time without any actions from you. There are some rules to adhere to. As a programmer you should rely on memory model the language spec gives to you to ensure visibility of changes. And this model says that if you write an atomic variable in one thread, then, if you read that value in another thread not only will this change propagate to another thread but all the changes made before that write as well. – vehsakul Oct 21 '19 at 01:17
  • @eaniconer: Yes, it does, that's why `atomic_bool.load(relaxed)` and `.store(true, relaxed)` can compile to plain loads/stores in asm for all real-world machines (which only run threads across cores with coherent caches). See [Why set the stop flag using \`memory\_order\_seq\_cst\`, if you check it with \`memory\_order\_relaxed\`?](https://stackoverflow.com/q/70581645) for more discussion of memory-order and inter-thread latency (surprisingly to some, there's essentially no benefit to using anything stronger than `relaxed` for an exit_now flag). – Peter Cordes Jan 11 '23 at 10:03
-5

actually, nothing goes wrong with plain bool in this particular example. the only notice is to declare bool exit variable as volatile to keep it in memory. both CISC and RISC architectures implement bool read/write as strictly atomic processor instruction. also modern multocore processors have advanced smart cache implementstion. so, any memory barriers are not necessary. the Standard citation is not appropriate for this particular case because it deals with the only one writing and the reading from the only one thread.

constm
  • 11
  • 2
    x86 doesn't even _define_ a bool type, so it cannot be atomic either. – MSalters Apr 06 '16 at 12:22
  • You are forgetting about caches. Multi core systems can have L1 cache per core and in an adverse situation a non atomic bool variable may reside on a cache line that is never invalidated and never reread. Atomic also ensure that proper acquire and release semantics are used for the cache lines. Further the use of atomic instructs the optimizer and will prevent possibly optimizing the per loop read away. (if (exit) while (1) {}) – rioki Apr 28 '19 at 15:54
  • @rioki Wouldn't volatile be sufficient to solve the caching issue? – Silicomancer Jan 13 '22 at 21:23
  • The C++ standard does not give you any guarantees on volatile and multiple threads. This MAY work on single x86 CPUs that keep caches coherent; but will break on multiple CPUs or other CPUs or yet to come architectures. All volatile guarantees that one thread of execution will not rearrange write and reads through optimization. See: https://www.drdobbs.com/parallel/volatile-vs-volatile/212701484 – rioki Jan 14 '22 at 14:01
  • Memory barriers are unnecessary, but stopping the compiler from hosting a load out of the loop is necessary. So use `atomic` with `memory_order_relaxed`. – Peter Cordes Jan 11 '23 at 09:48
  • @rioki: in practice all real-world C++ implementations only run `std::thread` threads across cores that share coherent cache, which is why OSes like Linux can use `volatile` to roll their own equivalent to `atomic`, with barriers if they need anything more than `relaxed` ordering. Just visibility alone does *not* require anything special in asm, only in C++ where you need to stop compilers from hoisting loads out of loops and stuff. See [MCU programming - C++ O2 optimization breaks while loop](https://electronics.stackexchange.com/a/387478) – Peter Cordes Jan 11 '23 at 09:50
  • @rioki: For more about coherent caches, see [this Q&A](https://stackoverflow.com/questions/4557979/when-to-use-volatile-with-multi-threading/58535118#58535118). `volatile` would work for an `exit_now` or `keep_running` flag on any mainstream system, including multi-socket PowerPC. Not just "single x86"! – Peter Cordes Jan 11 '23 at 09:53
  • OK, yes, multi socket CPUs can also have cache coherency protocols. But, do you want to write a program that will break when some HW architect decides to relax cache coherency for performance? (Which has already happened in parts, see temporary store buffers on Itanium.) The standard makes NOT guarantees on volatile. It MAY work on your CPU now. Maintaining decades old code has taught me, don't assume anything that is not guaranteed. – rioki Jan 11 '23 at 12:03