5

Suppose I have lots of threads and a plain, trivially-copyable, non-array (basic types like float, uint16_t etc.) variable called flag. One and only one of the threads often sets the variable's value, while others only read value from it and do not write to it. Do I have to make a variable atomic or guard it by mutex in this case? I know I must protect a variable when multiple threads write to it, but is it necessary to do in my case? Is it platfrom-dependent?

  • 2
    https://en.cppreference.com/w/cpp/language/memory_model#Threads_and_data_races – Mat Aug 08 '23 at 20:45
  • 5
    @binaryescape ... unless of course you have a CPU that doesn't update a simple variable atomically. In any event, I believe doing this is formally UB. – Paul Sanders Aug 08 '23 at 20:45
  • 7
    Atomicity is not your only threat here. A compiler may recognize that nothing in thread X ever changes the value and optimize out reads of the value to save time. – user4581301 Aug 08 '23 at 20:46
  • It depends on how the variable is used. If the timing of the writes has no particular meaning to the other threads, then it should be fine to not use synchronization. – user1806566 Aug 08 '23 at 20:46
  • 1
    @user1806566: since the other theads are reading the value, clearly it has meaning. So synchronization is needed. – Mooing Duck Aug 08 '23 at 20:50
  • 2
    You still need to make it atomic or use a mutex, because you need to ensure synchronization between the writing and the reading threads. Otherwise, compiler can optimize the code to read only once, cache the result, and never read again. See [this answer](https://stackoverflow.com/a/68550414/1458097). – heap underrun Aug 08 '23 at 20:51
  • @binaryescape: Even with a CPU with atomic writes, synchronization would still be needed, as the entire trivially-copiable struct is almost certainly not written atomically. – Mooing Duck Aug 08 '23 at 20:51
  • @heapunderrun does making variable `volatile` help? – postcoital.solitaire Aug 08 '23 at 20:55
  • 1
    _"...When an evaluation of an expression writes to a memory location and another evaluation __reads or modifies__ the same memory location, the expressions are said to conflict. A program that has two conflicting evaluations has a data race unless..."_ https://en.cppreference.com/w/cpp/language/memory_model "_...If a data race occurs, the behavior of the program is undefined...."_ – Richard Critten Aug 08 '23 at 20:56
  • 5
    @mollis_cactus [`volatile`](https://en.cppreference.com/w/cpp/language/cv) is **not** a synchronization primitive. – heap underrun Aug 08 '23 at 20:58
  • @RichardCritten so reading it simultaneously from multiple threads while knowing it is not being modified is not an issue? – postcoital.solitaire Aug 08 '23 at 21:03
  • @mollis_cactus That's what a mutex is for! But `std::atomic` is cheaper for primitive types. – Paul Sanders Aug 08 '23 at 21:10
  • @PaulSanders yep, but what if I'm absolutely sure it's not modified? Like say **all** threads go through some "state 1" during which they can only write to variables and then go through "state 2" when no writes are allowed, given these states don't overlap in time in all threads – postcoital.solitaire Aug 08 '23 at 21:19
  • 1
    @mollis_cactus you wrote _"...One and only one of the threads often sets the variable's value.."_ so there is modification and reading. So without synchronisation there is a data-race and a data-race is defined by the Standard to be Undefined Behaviour. – Richard Critten Aug 08 '23 at 21:26
  • 2
    A handy resource for checking if `volatile` will be sufficient for synchronising with c or c++ threads is the detailed page at http://isvolatileusefulwiththreads.com/c/ – Mike Vine Aug 08 '23 at 21:46
  • 1
    @mollis_cactus Why try to break the rules? Multi-threaded programming is hard enough as it is without introducing random and sporadic bugs by sloppy coding practises. – Paul Sanders Aug 08 '23 at 21:59
  • 1
    @mollis_cactus in that case [the comment above where the write happens before any reads] its well defined behavoir and you dont need a lock. You can read from a variable from as many threads as you like, its only if 1 or more of them do a write at the same time is there an issue. – Mike Vine Aug 08 '23 at 22:02

1 Answers1

7

Compilers are free to optimize this:

while (x != 0) {
  // code it knows does not modify x, nor synchronize with other threads
}

into

if (x!=0) {
  while (true) {
    // code it knows does not modify x
  }
}

ie, check x once and logically assume it cannot be changed.

If x is atomic, however, they are not allowed to do this, because each read implicitly synchronizes happens-after other threads writes to the variable.

In general, reading from a variable in thread 1 without synchronization of writing to the same variable in thread 2 is UB under the current C++ memory model.

This UB can both be a hardware issue, and permission for the compiler to optimize your code in ways you might feel are hostile.

So yes, you need to tell the compiler that your reads/writes are possibly going to involve inter-thread communication.

The fun part is that your code might work today, on your hardware with your current compiler. Then a few years from now, an innocuous operating system update, compiler update or hardware revision will make your code fail. What more, the failure case might be rare, and might even happen today!

You can't prove your code to be correct, because I can prove your code incorrect. You can't prove that the compiler (assuming it doesn't have bugs) will forevermore generate valid assembly that does what you ask, because the standard says your code exhibits UB.

You might be able to take the assembler output of a particular run of your compiler and prove that under the guarantees that your CPU manufacturer gives for those instructions that the produced code is correct. But I've seen CPU instruction descriptions, and good luck with that.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
  • To prevent the "optimization" or code transformation you gave as an example, C++ has the keyword `volatile`. And indeed, that would be enough to prevent compiler escapades. Thus, that reason to have to use mutexes is settled. On single core machines, you would be done. On multicore, however, this is still not good enough and you need "fences". But still not a mutex in this case. Atomics (Interlocked API in Windows) can provide those fence mechanisms. For single writer, multiple reader, that is good enough. – BitTickler Aug 09 '23 at 08:37
  • @BitTickler No, `volatlie` does not race condition UB not UB, despite what people say. The story about the optimization is just a story - once a program exhibits UB, the C++ standard no longer constrains its behavior. And that includes any wording around `volatile`. It may, in current compilers, OS's and hardware, make the program seem to work - but your program remains provably unconstrained by the C++ standard in what it actually does. – Yakk - Adam Nevraumont Aug 09 '23 at 13:30
  • Welcome to the margin between probably work and guaranteed to work. – user4581301 Aug 09 '23 at 19:46
  • There is also a lot of UB, which is as it is because else, what is done would be impossible in that lawyers-paradise language. And slinging the UB words around is not helping anyone. If what you say is in the spirit of the authors and maintainers of the language, they could as well remove the volatile keyword, which is in the standard EXACTLY for that reason - to prevent optimizations, which would otherwise lead to faulty behavior, such as read/write to a hardware register. – BitTickler Aug 10 '23 at 05:04
  • @BitTickler `volatile` does not exist to handle multi-threaded situations in C++. The C++ memory model does not give any special treatment to it. The C++ memory model is robust enough that GPU manufacturers are stealing it in order to give their hardware a sensible memory model! But, it intentionally doesn't use `volatile` for multi-threaded access - it could have, and chose not to. You are free to try to hack your way around the memory model; prior to the C++ memory model being added to the standard, there wasn't an alternative, and using `volatile` was common. – Yakk - Adam Nevraumont Aug 10 '23 at 13:06
  • I differentiated between single core and multi core in my original statement. And I stated, that it is good enough for single core and that you need fences for multi core. What nonsense would it be if volatile worked for hardware registers, but not for main memory in the single core case? The "fences" serve to manage cache coherency first and foremost. `volatile` tells the compiler not to optimize away read or write accesses. Only a fence and no volatile is just as suspicious, so please stop bashing poor `volatile`. – BitTickler Aug 11 '23 at 03:50
  • @BitTickler: Yes, `volatile` basically works as somewhat similar to `std::atomc` with `memory_order_relaxed` for types no wider than a register, on existing implementations (such as GCC) which go out of their way to support this for compat with legacy pre-C++11 code, thanks to compiler behaviour and coherent caches. There's usually no advantage to using `volatile` vs. `relaxed` atomics. See *[When to use volatile with multi threading?](https://stackoverflow.com/a/58535118)*. If you want more ordering than `relaxed`, you can often get it more efficiently with `atomic` with `acq`/`rel`. – Peter Cordes Aug 11 '23 at 06:55