36

Say we have two threads, one is reading a bool in a loop and another can toggle it at certain times. Personally I think this should be atomic because sizeof(bool) in C++ is 1 byte and you don't read/write bytes partially but I want to be 100% sure.

So yes or no?

EDIT:

Also for future reference, does the same apply to int?

R. Martinho Fernandes
  • 228,013
  • 71
  • 433
  • 510
szx
  • 6,433
  • 6
  • 46
  • 67
  • Isn't anything less than a word size of the underlying architecture both *atomic* and also *less efficient* than possible? – Cris Stringfellow Jan 31 '13 at 11:37
  • http://stackoverflow.com/questions/8037289/is-mutex-required-for-1-byte-shared-memory suggests it's non-atomic. – Lightness Races in Orbit Jan 31 '13 at 11:37
  • http://stackoverflow.com/questions/8517969/is-this-the-correct-way-to-atomically-read-and-write-a-bool suggests it's atomic "in most machines". – Lightness Races in Orbit Jan 31 '13 at 11:39
  • http://stackoverflow.com/questions/9585966/is-read-write-of-a-bool-value-guaranteed-to-be-one-instruction-in-c-c asks the same question, but the answers stick to the C++ layer. – Lightness Races in Orbit Jan 31 '13 at 11:40
  • Read the intels software developer manual, it exactly specifies under what circumstances which kind of writes/reads are atomic (e.g. if properly aligned even writes to 64bit are atomic). Note how things change if your type does not occupy all bits, that is if your bool is part of a bitfield. – PlasmaHH Jan 31 '13 at 11:42
  • 7
    By the way, I'm not aware of any requirement in the standard that mandates `sizeof(bool)`. – Lightness Races in Orbit Jan 31 '13 at 11:42
  • 2
    @LightnessRacesinOrbit: 5.3.3 even has a note about how they are implementation defined. – PlasmaHH Jan 31 '13 at 11:46
  • 5
    Yes, sizeof(bool) is implementation defined. I have worked on architectures where sizeof(bool) == 4. – Brian Neal May 01 '13 at 15:33
  • semi-related [Can modern x86 hardware not store a single byte to memory?](//stackoverflow.com/q/46721075) - in asm, byte stores are atomic without disturbing surrounding bytes. In C++ you still need `std::atomic` with at least `memory_order_relaxed`, not necessarily the default `mo_seq_cst`, to get safe asm. – Peter Cordes Jul 05 '19 at 00:47

3 Answers3

79

There are three separate issues that "atomic" types in C++11 address:

  1. tearing: a read or write involves multiple bus cycles, and a thread switch occurs in the middle of the operation; this can produce incorrect values.

  2. cache coherence: a write from one thread updates its processor's cache, but does not update global memory; a read from a different thread reads global memory, and doesn't see the updated value in the other processor's cache.

  3. compiler optimization: the compiler shuffles the order of reads and writes under the assumption that the values are not accessed from another thread, resulting in chaos.

Using std::atomic<bool> ensures that all three of these issues are managed correctly. Not using std::atomic<bool> leaves you guessing, with, at best, non-portable code.

Pete Becker
  • 74,985
  • 8
  • 76
  • 165
  • Isn't there also CPU instructions (or memory accesses) reordering at run time? A compiler may reorder loads and stores, but a CPU also can do that. – Roman Kruglov Sep 06 '18 at 12:48
  • @RomanKruglov: on x86, only StoreLoad reordering is possible (https://preshing.com/20120515/memory-reordering-caught-in-the-act/), so only seq-cst stores need extra ordering beyond blocking compile-time reordering. (e.g. `mov`+`mfence`, or better `xchg` to implement seq-cst stores.) In general on other ISAs, yes loads, stores, and RMWs may need extra barriers if they're not done with `mo_relaxed`. – Peter Cordes Jul 04 '19 at 23:40
  • Cache coherency is not the problem; normal systems are already coherent (using MESI or a variant). What `atomic` actually needs to do is stop the compiler from keeping values in *registers*, which are thread private. ([MCU programming - C++ O2 optimization breaks while loop](//electronics.stackexchange.com/a/387478)). Also, for seq-cst stores on x86, to stall the current thread until the store becomes globally visible (e.g. by using `xchg` or `mfence`) before later loads can run. Global visibility *would* happen on its own, but potentially after later loads. – Peter Cordes Jul 04 '19 at 23:43
  • See also [Why is integer assignment on a naturally aligned variable atomic on x86?](//stackoverflow.com/a/36685056) and [Can num++ be atomic for 'int num'?](//stackoverflow.com/q/39393850) – Peter Cordes Jul 04 '19 at 23:44
  • 2
    See also [Myths Programmers Believe about CPU Caches](https://software.rajivprab.com/2018/04/29/myths-programmers-believe-about-cpu-caches/) re: manual coherency. C++ is designed around the assumption of *coherent* shared memory so all you need to do is make sure the store or load actually happens in asm, not keeping a value in a register. On a hypothetical machine with non-coherent shared memory, every synchronizes-with would have to flush everything (or need a lot of tracking), but I'm not aware of any C++ implementations for standard threads with non-coherent shared memory. – Peter Cordes Oct 20 '19 at 19:51
  • I agree with your conclusion: use `atomic` with at least `memory_order_relaxed`, if not the default `seq_cst`. But some of your reasoning for why doesn't hold up. Point 2 is highly misleading because no real CPUs are like that. – Peter Cordes Oct 20 '19 at 19:56
19

It all depends on what you actually mean by the word "atomic".

Do you mean "the final value will be updated in one go" (yes, on x86 that's definitely guaranteed for a byte value - and any correctly aligned value up to 64 bits at least), or "if I set this to true (or false), no other thread will read a different value after I've set it" (that's not quite such a certainty - you need a "lock" prefix to guarantee that).

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
  • "if I set this to true (or false), no other thread will read a different value after I've set it". I think the question is pretty clear. This latter interpretation isn't doesn't have anything to do with atomicity. – jberryman Jan 23 '16 at 23:45
  • 2
    @jberryman: The problem comes with caches as well as the compiler optimising the read of the memory. The `b = false;` in some thread, does not guarantee that all other threads, in their next case of `if (b) ...`, will pick up that `b` is false. This requires that the compiler hasn't optimised the access to `b` into `tmp = b; ... if (tmp) ...` [where `tmp` is a register]. Depending on the code inside the thread, there are situations when a compiler WILL do this. – Mats Petersson Jan 24 '16 at 09:41
  • *no other thread will read a different value after I've set it* - `mfence` or a `lock` prefix are only needed to nail down the meaning of "after". Memory is coherent on all x86 systems, so after the store instruction eventually commits to L1d cache, no other thread can read the old value. You only need barriers to implement a seq-cst store and make sure *this* thread doesn't do any other loads before the store is globally visible. It definitely *will* become globally visible on its own very soon. [Can I force cache coherency on a multicore x86 CPU?](//stackoverflow.com/a/558888) – Peter Cordes Jul 05 '19 at 02:05
  • TL:DR: a barrier doesn't explicitly flush or write-back cache, it only stalls this thread until the value commits from the store buffer to this core's L1d cache (and thus becomes globally visible.) – Peter Cordes Jul 05 '19 at 02:06
  • *"if I set this to true (or false), no other thread will read a different value after I've set it" (that's not quite such a certainty - you need a "lock" prefix to guarantee that).* - *bool* can have any hardware implementation, and another thread can read any, may be even "partial" state of *bool*. but then readed value is **interpreted** as *true* or *false*. so can not be any "different value". in this sense read of bool always "atomic" - we always got *true* or *false* and never something different. lock need only in case rmw operation or when we need order between this bool and other mem – RbMm Dec 24 '19 at 13:39
6

x86 only guarantees word-aligned reads and writes of word size. It does not guarantee any other operations, unless explicitly atomic. Plus, of course, you have to convince your compiler to actually issue the relevant reads and writes in the first place.

Puppy
  • 144,682
  • 38
  • 256
  • 465
  • does x86 guarantees cache coherency? – choxsword Nov 07 '18 at 14:11
  • @bigxiao: yes, every normal SMP system regardless of ISA guarantees cache coherency, and uses MESI (or some variant) to achieve it. Part of what `atomic` does is stop the compiler from keeping values in *registers* instead of memory, because registers are thread-private. But memory is *always* coherent. You only need barriers if you want ordering between loads and stores, e.g. to make the current thread wait until a store is visible before doing later reads. Making a store globally visible always happens as quickly as possible regardless of barriers. (commit from store buffer to L1d) – Peter Cordes Jul 04 '19 at 23:34
  • x86 guarantees slightly more, e.g. byte load/store is always atomic, and 16-bit loads/stores that don't cross an 4-byte boundary also atomic. And dword (32-bit) aligned loads/stores are atomic. Also, on modern x86 (AMD and Intel P6 and later) cached loads/stores of any width are atomic as long as they don't cross an 8-byte boundary. [Why is integer assignment on a naturally aligned variable atomic on x86?](//stackoverflow.com/a/36685056) So yes, on x86 all `std::atomic<>` has to do for pure loads / pure stores is make sure values are naturally aligned, and not optimized away. – Peter Cordes Jul 04 '19 at 23:38