C++ atomics reading stale value

Question

I'm reading the C++ Concurrency in Action book and I'm having trouble understanding the visibility of writes to atomic variables.

Lets say we have a

std::atomic<int> x = 0;

and we read/write with sequential consistent ordering

1. ++x;        
   // <-- thread 2
2. if (x == 1) {
   // <-- thread 1
   }

If we have 2 threads that execute the code above. Is it possible that thread 1 arrives at line 2. and reads x == 1, after thread 2 already executed line 1.? So does the sequential consistent ++x of thread 2 instantly gets propagated to thread t1 or is it possible that thread 1 reads a stale value x == 1?

I think if we use relaxed_ordering or acq/rel the above situation is possible, but how about the sequential consistent ordering?

With sequential consistent ordering, you should never read a stale value. If thread A modifies a value, and thread B reads that value afterwards, it'll read the modified value. — Jerry Coffin, Jul 13 '22 at 19:55
It isn't clear what you mean by "already executed line 1". There is no absolute time scale that runs for both threads. Each thread has its own time. You can talk about effects of action A on thread X time becoming visible to thread Y when thread Y reaches point B on *its* time. But it makes no sense to talk about A and B happening "at the same time", or changes from A propagating "instantly" to B, there is no such thing. — n. m. could be an AI, Jul 13 '22 at 20:13
Yeah, it's really hard for me not to think in terms of time. So lets say between line 1. and 2. there is some action A. Can I be absolutely sure that if I enter the if condition in line 2. in thread 1,the effect of action A performed by thread 2 could not have been visible to thread 1? And does it hold for seq consistency only or for other orderings aswell? — dawuald, Jul 13 '22 at 20:24
@dawuald: For a simple case like this, where only atomics are involved, and it's always the *same* atomic, aside from relaxed ordering, I'm fairly sure you'd be guaranteed that all threads would observe the operations on this specific atomic in a reliable order. Different threads could interleave, but the order would "make sense"; each increment would precede the test in that same thread, but the other thread's increment (and possibly test) could occur between the increment and the test in the other thread. — ShadowRanger, Jul 13 '22 at 20:33
In general, anything above relaxed ordering will get the same end result considering the observed behavior of a *single* atomic variable. They only make a difference when it comes to observing changes amongst multiple atomic or non-atomic variables (and without sequential consistency, it's possible for such a multi-variable system to observe some operations happen in different orders in each thread). — ShadowRanger, Jul 13 '22 at 20:39
Hm yeah but still, my problem is that if we imagine some arbitrarily complex operation A between line 1. and 2. And now in thread 1 at line 2. we read x and it is equal 1. Then if there is no way that we can read a stale x, than I could be sure that thread t2 couldn't have begun operation A until this point (from perspecitve of t1), If on the other side it is possible to read a stale x == 1 in thread t1 even though it has been already incremented by thread 2, then I cannot be sure that effects of operation A performed by t2 are invisible to t1 — dawuald, Jul 13 '22 at 20:48
@dawuald: With sequential consistency, if thread 1 sees `x == 1` then at that *exact* instant, you'd know thread 2 had not yet performed the increment, and therefore has not begun operation A. But that could change before thread 1 begins executing code in the block controlled by the `if`, so it doesn't mean much; the scheduler could easily swap out thread 1 *immediately* after `x == 1` is tested but before the result is used to jump (or not jump) and thread 2 could hit `++x` at that instant, then run operation A to completion and hit its own test (seeing `x == 2`). It's not super *useful*. — ShadowRanger, Jul 13 '22 at 20:54
@ShadowRanger ok great, this is what I wanted to know :). I know that after I enter the if condition the x == 1 may no longer be up-to-date. But at least I can be sure that after thread t1 finished operation A, thread t2 couldn't have started yet. Which may be useful in some situations. Would it still hold if i would have used relaxed ordering instead? — dawuald, Jul 13 '22 at 21:03
@dawuald: Relaxed ordering would break this entirely. Even ignoring weakly memory ordered systems and the craziness they allow, relaxed ordering would allow the compiler to statically emit code that executes the `++x` *after* your hypothetical "operation A" (assuming it didn't use `x` within it or use other atomics with stricter ordering). You'd never notice if threads weren't involved (snothing uses the value), so all the relaxed ordering gets you is a guarantee that `++x` won't be performed as separate read, modify, write (that would allow for dropped increments, and/or torn reads/writes). — ShadowRanger, Jul 13 '22 at 21:09
@ShadowRanger ok great, that's what I've been thinking too. Then last question: how about acq_rel ordering on x? I believe I could also read a stale value, right? — dawuald, Jul 13 '22 at 21:13
@dawuald: I'd recommend looking at [cppreference's example of what can happen with relaxed ordering](https://en.cppreference.com/w/cpp/atomic/memory_order#Relaxed_ordering); it's a fun one, where relaxed operations on two different atomics passing through non-atomic caches can end up having the *second* operation in one thread affect the *first* operation *in that same thread* thanks to the other thread performing other relaxed ordering operations. — ShadowRanger, Jul 13 '22 at 21:14
@ShadowRanger, Thanks, but I know this example already. Since it involves 2 variables it is easier to grasp for me. My problem was more with inter-thread visibility of writes to a single variable. — dawuald, Jul 13 '22 at 21:18
https://preshing.com/20120710/memory-barriers-are-like-source-control-operations/ is a useful mental model for separate threads accessing shared memory, and how other threads only see your stores after they commit to L1d cache after going through the store buffer. — Peter Cordes, Jul 13 '22 at 21:18
@dawuald: I think acq/rel (for the increment) with acq (for the test) is safe. Acq/rel prevents any store/load reordering within that thread, so operation A in thread 2 cannot begin before the `++x` in thread 2 occurs and is visible to thread 1. That said, it has the same "it could increment a nanosecond later" issue, and if the increment is seen to have completed, you can't actually expect to see the effects of operation A even if actually happened (with no atomic release operations after that point, the effects of operation A in thread 2 might not be visible to thread 1 for some time). — ShadowRanger, Jul 13 '22 at 21:30
Near duplicate: [What C++11 operations/memory orders guarantees freshness?](https://stackoverflow.com/q/14687703) — Peter Cordes, Jul 20 '22 at 15:19

ShadowRanger · Answer 1 · 2022-07-13T21:00:57.727

If you're thinking that multiple atomic operations are somehow safely grouped, you're wrong. They'll always occur in order within that thread, and they'll be visible in that order, but there is no guarantee that two separate operations will occur in one thread before either occurs in the other.

So for your specific question "Is it possible that thread 1 arrives at line 2. and reads x == 1, after thread 2 already executed line 1.?", the answer is yes, thread 1 could reach the x == 1 test after thread 2 has incremented x as well, so x would already be 2 and neither thread would see x == 1 as true.

The simplest way to think about this is to imagine a single processor system, and consider what happens if the running thread is switched out at any time aside from the middle of a single atomic operation.

So in this case, the operations (inc1 and test1 for thread 1 and inc2 and test2 for thread 2) could occur in any of the following orders:

inc1 test1 inc2 test2
inc1 inc2 test1 test2
inc1 inc2 test2 test1
inc2 inc1 test1 test2
inc2 inc1 test2 test1
inc2 test2 inc1 test1

As you see, there is no possibility of either test occurring before either increment, nor can both tests pass (because the only way a test passes is if the increment associated with it on that thread has occurred but not the increment on the other thread), but there's no guarantee any test passes (both increments could precede both tests, causing both tests to test against the value 2 and neither test to pass). The race window is narrow, so most of the time you'd probably see exactly one test pass, but it's wholly possible to get unlucky and have neither pass.

If you want to make this work reliably, you need to make sure you both modify and test in a single operation, so exactly one thread will see the value as being 1:

if (++x == 1) {  // The first thread to get here will do the stuff
   // Do stuff
}

In this case, the increment and read are a single atomic operation, so the first thread to get to that line (which might be thread 1 or thread 2, no guarantees) will perform the first increment with ++x atomically returning the new value which is tested. Two threads can't both see x become 1, because we kept both increment and test as one operation.

That said, if you're relying on the content of that if being completed before any thread executes code after the if, that won't work; the first thread could enter the if, while the second thread arrives nanoseconds later and skips it, realizing it wasn't the first to get there, and it would immediately begin executing the code after the if even if the first thread hasn't finished. Simple use of atomics like this is not suited for a "run only once" scenario that people often write this code for when the "run only once" code must be run exactly once before dependent code is executed.

Note: If *different* atomic (or non-atomic) variables were involved, with non-sequential consistency memory orders, my description of "imagine a single processor system" becomes pretty useless. While x86 has a strongly ordered memory model, and any model stricter than relaxed ordering will behave roughly as you'd expect from that thought exercise, on non-x86 systems with weakly ordered memory models, things can get *really* weird, where operations A then B executed in thread 1 might be observed to occur as B then A in thread 2 if no memory order constraint forces the effects to become visible. — ShadowRanger, Jul 13 '22 at 20:49
Was about to say *and they'll be visible in that order* is only true if readers and writers all use the default `seq_cst` order. Enforcing StoreLoad ordering is the most expensive part of that (especially for x86 where nothing else needs any barriers: only seq_cst pure-stores. Atomic RMWs are already full barriers in x86 asm; it can't do anything weaker). — Peter Cordes, Jul 13 '22 at 21:23
@PeterCordes: Yeah, I was starting off with "for your original, unmodified scenario with seq cst everywhere" the answer is *fairly* simple. Once you get into non-seq cst on weakly ordered memory systems, things get *weird* (though the weirdness is reduced when *all* the operations apply to a single atomic variable, with no other atomics or non-atomics involved). :-) — ShadowRanger, Jul 13 '22 at 21:33

score 0 · Answer 2 · answered Jul 20 '22 at 09:09

0

Let's simplify your question.

When 2 threads execute func()

std::atomic <int> x=0;
void func()
{
 ++x;
 std::cout << x;
}

Following result is possible?

And the answer is NO! Only "12" or "21" is possible.

The sequential consistency on a atomic variable works as you want on this simple case.

answered Jul 20 '22 at 09:09

Jung Nai Hoon

1
1

That's not as helpful as it might be, because the C++ standard has special language for RMW operations, guaranteeing *they* see the latest value. e.g. [Is a memory barrier required to read a value that is atomically modified?](https://stackoverflow.com/q/71718224) has some discussion in comments, and a section in my answer, about why that's *not* important or better. – Peter Cordes Jul 20 '22 at 15:18
Also related: [What C++11 operations/memory orders guarantees freshness?](https://stackoverflow.com/q/14687703) - nothing, the operations could happen in either order, which is the point you're trying to make. And yes, it's true whether they're both RMWs, or if one is a read. – Peter Cordes Jul 20 '22 at 15:19
`std::memory_order_seq_cst` isn't required for this, it would still apply with `x.fetch_add(1, relaxed)`. Every atomic variable has its own *modification order* that all threads can agree on, but that only serializes modifications, not reads. The coherency requirements involving reads just guarantee that if you've seen one value in the modification order, you can't see an earlier one. But they're not as strong as seq_cst. If you just meant modification order rules, I'd recommend avoiding the phrase "sequential consistency". – Peter Cordes Jul 20 '22 at 15:32
Mentioning memory consistency models other than the "sequential consistency” only makes answer complicated. Explanation of precise consistent models should be followed by more precise questions. I think you are answering far more than the question requested. – Jung Nai Hoon Jul 21 '22 at 10:30

C++ atomics reading stale value

2 Answers2

Linked