0

While trying to understand memory orders I stumbled upon this video. The video claims that the assertion at the end of the main function may fail, but I do not understand why or if this is correct.

What I understand from std::memory_order_release is that no reads or writes in the current thread can be re-ordererd after this store. And for std::memory_order_acquire no reads or writes in the current thread can be re-ordered before this load. But for each reading thread in the example given there is a wait for a different writer thread, and the if from a reading thread cannot be re-ordered before the while, because of the std::memory_order_acquire. So, shouldn't at least one thread increment the z variable?

#include <atomic>
#include <thread>
#include <memory>
#include <cassert>

std::atomic<bool> x;
std::atomic<bool> y;
std::atomic<int> z;

void write_x() {
  x.store(true, std::memory_order_release);
}

void write_y() {
  y.store(true, std::memory_order_release);
}

void read_x_then_y() {

  while (!x.load(std::memory_order_acquire));

  if (y.load(std::memory_order_acquire)) {
    z++;
  }
}

void read_y_then_x() {

  while (!y.load(std::memory_order_acquire));

  if (x.load(std::memory_order_acquire)) {
    z++;
  }
}

int main() {

  x = false;
  y = false;
  z = 0;

  std::thread a(write_x);
  std::thread b(write_y);
  std::thread c(read_x_then_y);
  std::thread d(read_y_then_x);

  a.join();
  b.join();
  c.join();
  d.join();

  assert(z != 0);

  return 0;
}

Compiled with CPPFLAGS="-latomic -pthread" make main

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
MangaD
  • 315
  • 4
  • 12
  • 2
    This is the IRIW litmus test, whether two reader threads agree about the order of two writes by two other threads. `release` or `relaxed` makes no difference for the writers; only `seq_cst` on both writes and reads would rule out IRIW in ISO C++. (In practice all real hardware except some POWER CPUs will agree on the order of writes, whatever it turns out to be in a given run.) – Peter Cordes Dec 18 '21 at 14:12
  • The issue you're missing is that there's no synchronization between threads `c` and `d`. Even though, it might seem like "either y is stored first, or x is stored first", in reality it can be "for thread c y is stored first, but for thread d it's x". This limitation has to do with how cores operate on caches - each thread may communicate with its own cache. You can use `seq_cst` to avoid this issue. – ARentalTV Dec 18 '21 at 14:14
  • 1
    @ARentalTV: Nope, cache isn't the mechanism for IRIW reordering. All real-world machines that run C++ threads across multiple cores have those cores in the same cache coherency domain. To get IRIW reordering, you need some side-channel for stores to become visible to some cores before they become *globally* visible to all cores besides the one doing the stores (by committing to coherent L1d cache). On POWER, that's store-forwarding of data from retires store instructions between SMT threads on the same physical core. See [this Q&A](https://stackoverflow.com/q/27807118) about the HW details – Peter Cordes Dec 18 '21 at 15:01

0 Answers0