Non-deterministic read values when using std::atomic store/load with std::memory_order_seq_cst

Question

I started learning about memory orderings in C++ using std::atomic, I'm trying to understand the synchronization mechanism between a successive store and load of an atomic variable from two different threads. If we to call the load and store from two different threads using the default memory order std::memory_order_seq_cst, like this

std::atomic<int> data(0);

void func() {
  data.store(1234, std::memory_order_seq_cst);
}

int main() {
  std::thread t(func);
  int val = data.load(std::memory_order_seq_cst);
  std::cout << "value: " << val << std::endl;
  t.join();
  return 0;
}

I'm seeing non-deterministic output (most of the time 0, but sometimes 1234). I learned that the atomic load and store must happen based on program order, which means that they are synchronized, which from my pov is contradicting between what I learned and what I'm seeing. What's the gap in my understanding? Is it that while the store and load is ordered, the memory itself is not coherent between the two threads? (btw, I compiled the above program using g++ -std=c++17 -pthread main.cpp -o main).

I compiled and ran the program above.

Sometimes the new thread runs first, sometimes it doesn't. Simple as that. — Jesper Juhl, Aug 16 '23 at 17:48
Various memory orders only matter when you have more than just one atomic variable, they define how this variable interacts with other operations. — HolyBlackCat, Aug 16 '23 at 17:51
"Program order" means the program order *within a single thread*. The store in `func()` and the load in `main` are in two separate threads; there is no program ordering between them. In the language of the C++ standard, they are *unsequenced*. — Nate Eldredge, Aug 19 '23 at 02:13
The significance of synchronization here is that *if* the load returns the value 1234 that was stored, then the store synchronizes with it. This is not relevant to the value returned by the load itself (which we're already assuming is 1234), but to the behavior of surrounding accesses to other variables. For example, if `func()` writes to some other variable `x` before doing the store, and `main` reads from `x` after doing the load, *and* the load returns 1234, then the value read from `x` in `main` will be the value written in `func`. But this has nothing to do with your question. — Nate Eldredge, Aug 19 '23 at 02:16

score 2 · Accepted Answer · answered Aug 16 '23 at 18:22

TL:DR: as Jesper Juhl commented:
Sometimes the new thread runs first, sometimes it doesn't. Simple as that.

The whole point of threads is that they can run independently of each other. seq_cst means that the total order is some interleaving of program order of each thread, but there's no guarantee which interleaving you'll get.

The order where the store goes first and the load goes second, and vice versa, are both allowed. With only one atomic operation in each thread, and no other shared data, seq_cst isn't doing anything that relaxed wouldn't.

Your program doesn't do anything to guarantee / require that the load will run before vs. after the store, e.g. putting the load before the std::thread t(func); constructor or after the t.join(); would both create a happens-before relationship between load and store.

In your current program, it's just up to chance and the OS's scheduling decisions on thread creation whether data.load runs before or after data.store runs (and the data goes through the store buffer and commits to cache, becoming globally visible).

Is it that while the store and load is ordered, the memory itself is not coherent between the two threads?

No, C++ guarantees coherency - a later read is guaranteed to see a value from an earlier store, from the modification order of the object you're reading.

(It would be really hard to actually make a C++ implementation on hardware without coherent cache, since C++ requires that separate threads can modify adjacent char objects in an array without interfering with each other, among other things. If two threads had dirty copies of the same line, they'd need per-byte dirty bitmaps to merge on commit if they wanted to avoid stepping on the other thread's store during write-back. All real hardware has coherent cache between cores that std::thread can run threads across, typically with MESI so a core doing a store has to get exclusive ownership of the cache line first (invalidating all other copies), before modifying it.)

C++ atomics reading stale value - near duplicate. You only get synchronization between threads if the load does happen to see a value stored by another thread.
Sequentially consistent fence - C++ memory barriers (and operations with non-relaxed memory orders) aren't like pthread_barrier() synchronization primitives that wait for all threads to reach them (wikipedia).

score -2 · Answer 2 · answered Aug 16 '23 at 18:09

-2

There is an example of std::memory_order_seq_cst found at https://en.cppreference.com/w/cpp/atomic/memory_order that has the sub threads join before the read in the main thread:

int main()
{
    std::thread a(write_x);
    std::thread b(write_y);
    std::thread c(read_x_then_y);
    std::thread d(read_y_then_x);
    a.join(); b.join(); c.join(); d.join();
    assert(z.load() != 0);  // will never happen
}

answered Aug 16 '23 at 18:09

Catcow

7
2

The code you posted doesn't compile. – Jesper Juhl Aug 16 '23 at 18:53

Non-deterministic read values when using std::atomic store/load with std::memory_order_seq_cst

2 Answers2