0

In a windows/linux multithreaded C program, if an unsigned int global variable is accessed by three threads without any synchronization, where

  • Thread 1 writes the value 0
  • Thread 2 writes the value 0xFFFFFFFF
  • Thread 3 reads the value

Question

Is it possible for Thread 3 to retrieve a partial value, say 0x0000FFFF from the global variable?

I've always assumed that if an unsigned int is properly aligned, a write operation is atomic, so in this case, Thread 3 would always be either 0 or 0xFFFFFFFF.

vengy
  • 1,548
  • 10
  • 18
  • 2
    You can't assume that writes to `unsigned int` will be atomic. This is what `sigatomic_t` is for. – Barmar Aug 18 '23 at 20:47
  • 3
    If you're asking about C in general, then I agree with @Barmar that you cannot assume it will be atomic. On many systems though it would, in practice, be atomic. But your question doesn't specify that. – pmacfarlane Aug 18 '23 at 21:00
  • 2
    https://en.cppreference.com/w/c/language/atomic – Hans Passant Aug 18 '23 at 21:28
  • 3
    "_Is it possible for Thread 3 to retrieve a partial value, say 0x0000FFFF from the global variable?_": It is much worse: The compiler is for many situations of the kind you mention likely going to optimize away your intended read or write. It is simply undefined behavior to write or read a non-atomic object while another thread writes to it unsynchronized and the compiler is going to make use of that UB. And even if it doesn't, there is no guarantee that a non-atomic write will propagate to another thread. It could e.g. be held in a register. Etc... – user17732522 Aug 18 '23 at 21:31
  • In ISO C11 it's data-race UB so literally any behaviour is allowed, but I think you're asking if it's plausible on any real hardware? Obviously a Deathstation 9000 implementation could do that by always storing and/or loading values in 16-bit halves. And there are 16-bit CPUs where there's zero expectation that `uint32_t` is ever naturally atomic. – Peter Cordes Aug 18 '23 at 22:51
  • Also, this is the 32-bit equivalent of [Which types on a 64-bit computer are naturally atomic in gnu C and gnu C++? -- meaning they have atomic reads, and atomic writes](https://stackoverflow.com/q/71866535) where Nate's answer shows not-guaranteed-atomic code-gen for AArch64 storing a repeating constant. I linked this on your previous question ([Best way to atomically bitwise AND a byte in C/C++?](https://stackoverflow.com/posts/comments/135620440)) – Peter Cordes Aug 18 '23 at 22:52
  • 1
    This question has come up several times before. It's true that an ordinary load/store instruction of an aligned `unsigned int` sized object is atomic on *some* architectures, maybe even *most*. The problem is that the compiler is not obliged to compile an access to `int x` into a single ordinary load or store instruction. It is allowed to accomplish the access in some other way if it chooses to do so. – Nate Eldredge Aug 19 '23 at 02:01
  • 1
    It might for instance decide it's better to load or store `x` in two halves. (Suppose for instance that the value `0x0000ffff` is already in some register, and we can't afford another one to materialize `0xffffffff`, so we do two 16-bit stores instead.) Or, as in ikegami's answer, it might compile the access into no instructions at all, optimizing it away when the data race rules permit this. – Nate Eldredge Aug 19 '23 at 02:02

2 Answers2

3

You forgot one very likely case: That thread 3 doesn't read the value at all, using a previously read value instead.

Synchronization doesn't just prevent partial read/writes; it also informs the CPU and compiler that the value could be changed by another thread. They can and do assume that won't happen if there's no synchronization, so a cached value could be used, or the read can be optimized away, etc.

(I'm using the term "synchronization" loosely here.)

Given this possibility, it's moot whether you can end up with 0000FFFF or not.

ikegami
  • 367,544
  • 15
  • 269
  • 518
  • *so a cached value could be used, or the read can be optimized away.* - Those are the same thing. If you optimize away reading memory, it's by using a value from a register (which some people confusingly describe as "cached", but that's by software not the CPU hardware's data cache). If the asm has a load instruction, it will read from coherent cache, because all real-world C++ implementations [only run threads across cache-coherent cores](//stackoverflow.com/a/58535118/224132), and it would be [hard](//stackoverflow.com/q/76851091) to implement C++'s mem model on non-coherent shared memory. – Peter Cordes Aug 18 '23 at 22:59
  • (Maybe the compiler could invent a temporary in local stack space and copy to there to reduce contention? Compiler generally don't know which globals to expect contention from, so won't do that.) Anyway, if you want synchronization (like a happens-before relationship) as well as lack of tearing, you need memory *ordering*, not just atomicity. – Peter Cordes Aug 18 '23 at 23:01
  • But even if you just want `memory_order_relaxed`, see also [Who's afraid of a big bad optimizing compiler?](https://lwn.net/Articles/793253/) on LWN for a catalogue of some of the less-obvious problems that are possible and happen in practice. e.g. invented loads, so if the source does `int tmp = global;`, one use of `tmp` might see one value, a later use of `tmp` might see another value if the compiler chose to reload the global instead of copying it. That's different from tearing within one value, though. – Peter Cordes Aug 18 '23 at 23:02
3

Is it possible for Thread 3 to retrieve a partial value, say 0x0000FFFF from the global variable?

Your program contains a data race, therefore its behavior is undefined as far as C is concerned. In principle, yes, thread 3 could observe a value different from the variable's initial value and from the values written by threads 1 and 2, without limit.

I've always assumed that if an unsigned int is properly aligned, a write operation is atomic, so in this case, Thread 3 would always be either 0 or 0xFFFFFFFF.

It is a mistake to try to predict the behavior of a C program based on your own model of how its source code corresponds to lower level instructions. And you do not need to attempt that if you write code whose behavior is well defined by the C language spec.

It may nevertheless be the case that your particular C implementation will never exhibit the kind of shearing you ask about under the circumstances you describe, but even if so, that does not make your racy code safe. Its behavior is still undefined, and it might still exhibit misbehavior in practice. Such as thread 3 never observing either of the other two threads' writes, for example.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157