Is a global variable accessed by three threads (2 writers, 1 reader) without any synchronization potentially undefined?

Question

In a windows/linux multithreaded C program, if an unsigned int global variable is accessed by three threads without any synchronization, where

Thread 1 writes the value 0
Thread 2 writes the value 0xFFFFFFFF
Thread 3 reads the value

Question

Is it possible for Thread 3 to retrieve a partial value, say 0x0000FFFF from the global variable?

I've always assumed that if an unsigned int is properly aligned, a write operation is atomic, so in this case, Thread 3 would always be either 0 or 0xFFFFFFFF.

You can't assume that writes to `unsigned int` will be atomic. This is what `sigatomic_t` is for. — Barmar, Aug 18 '23 at 20:47
If you're asking about C in general, then I agree with @Barmar that you cannot assume it will be atomic. On many systems though it would, in practice, be atomic. But your question doesn't specify that. — pmacfarlane, Aug 18 '23 at 21:00
"_Is it possible for Thread 3 to retrieve a partial value, say 0x0000FFFF from the global variable?_": It is much worse: The compiler is for many situations of the kind you mention likely going to optimize away your intended read or write. It is simply undefined behavior to write or read a non-atomic object while another thread writes to it unsynchronized and the compiler is going to make use of that UB. And even if it doesn't, there is no guarantee that a non-atomic write will propagate to another thread. It could e.g. be held in a register. Etc... — user17732522, Aug 18 '23 at 21:31
In ISO C11 it's data-race UB so literally any behaviour is allowed, but I think you're asking if it's plausible on any real hardware? Obviously a Deathstation 9000 implementation could do that by always storing and/or loading values in 16-bit halves. And there are 16-bit CPUs where there's zero expectation that `uint32_t` is ever naturally atomic. — Peter Cordes, Aug 18 '23 at 22:51
Also, this is the 32-bit equivalent of [Which types on a 64-bit computer are naturally atomic in gnu C and gnu C++? -- meaning they have atomic reads, and atomic writes](https://stackoverflow.com/q/71866535) where Nate's answer shows not-guaranteed-atomic code-gen for AArch64 storing a repeating constant. I linked this on your previous question ([Best way to atomically bitwise AND a byte in C/C++?](https://stackoverflow.com/posts/comments/135620440)) — Peter Cordes, Aug 18 '23 at 22:52
This question has come up several times before. It's true that an ordinary load/store instruction of an aligned `unsigned int` sized object is atomic on *some* architectures, maybe even *most*. The problem is that the compiler is not obliged to compile an access to `int x` into a single ordinary load or store instruction. It is allowed to accomplish the access in some other way if it chooses to do so. — Nate Eldredge, Aug 19 '23 at 02:01
It might for instance decide it's better to load or store `x` in two halves. (Suppose for instance that the value `0x0000ffff` is already in some register, and we can't afford another one to materialize `0xffffffff`, so we do two 16-bit stores instead.) Or, as in ikegami's answer, it might compile the access into no instructions at all, optimizing it away when the data race rules permit this. — Nate Eldredge, Aug 19 '23 at 02:02

score 3 · Accepted Answer · answered Aug 18 '23 at 21:46

3

You forgot one very likely case: That thread 3 doesn't read the value at all, using a previously read value instead.

Synchronization doesn't just prevent partial read/writes; it also informs the CPU and compiler that the value could be changed by another thread. They can and do assume that won't happen if there's no synchronization, so a cached value could be used, or the read can be optimized away, etc.

(I'm using the term "synchronization" loosely here.)

Given this possibility, it's moot whether you can end up with 0000FFFF or not.

answered Aug 18 '23 at 21:46

ikegami

367,544
15
269
518

*so a cached value could be used, or the read can be optimized away.* - Those are the same thing. If you optimize away reading memory, it's by using a value from a register (which some people confusingly describe as "cached", but that's by software not the CPU hardware's data cache). If the asm has a load instruction, it will read from coherent cache, because all real-world C++ implementations [only run threads across cache-coherent cores](//stackoverflow.com/a/58535118/224132), and it would be [hard](//stackoverflow.com/q/76851091) to implement C++'s mem model on non-coherent shared memory. – Peter Cordes Aug 18 '23 at 22:59
(Maybe the compiler could invent a temporary in local stack space and copy to there to reduce contention? Compiler generally don't know which globals to expect contention from, so won't do that.) Anyway, if you want synchronization (like a happens-before relationship) as well as lack of tearing, you need memory *ordering*, not just atomicity. – Peter Cordes Aug 18 '23 at 23:01
But even if you just want `memory_order_relaxed`, see also [Who's afraid of a big bad optimizing compiler?](https://lwn.net/Articles/793253/) on LWN for a catalogue of some of the less-obvious problems that are possible and happen in practice. e.g. invented loads, so if the source does `int tmp = global;`, one use of `tmp` might see one value, a later use of `tmp` might see another value if the compiler chose to reload the global instead of copying it. That's different from tearing within one value, though. – Peter Cordes Aug 18 '23 at 23:02

John Bollinger · Answer 2 · 2023-08-18T22:42:27.950

Is it possible for Thread 3 to retrieve a partial value, say 0x0000FFFF from the global variable?

Your program contains a data race, therefore its behavior is undefined as far as C is concerned. In principle, yes, thread 3 could observe a value different from the variable's initial value and from the values written by threads 1 and 2, without limit.

I've always assumed that if an unsigned int is properly aligned, a write operation is atomic, so in this case, Thread 3 would always be either 0 or 0xFFFFFFFF.

It is a mistake to try to predict the behavior of a C program based on your own model of how its source code corresponds to lower level instructions. And you do not need to attempt that if you write code whose behavior is well defined by the C language spec.

It may nevertheless be the case that your particular C implementation will never exhibit the kind of shearing you ask about under the circumstances you describe, but even if so, that does not make your racy code safe. Its behavior is still undefined, and it might still exhibit misbehavior in practice. Such as thread 3 never observing either of the other two threads' writes, for example.

Is a global variable accessed by three threads (2 writers, 1 reader) without any synchronization potentially undefined?

2 Answers2