c++ memory mode:
Establishes memory synchronization ordering of non-atomic and relaxed atomic accesses, as instructed by order, without an associated atomic operation. Note however, that at least one atomic operation is required to set up the synchronization, as described below.
The question is why one atomic operation is required
In my standing, atomic_thread_fence acts as a Load-Store-queue flushing, e.g. same with smp_rmb/smp_wmb in linux kernel.
So, it seems Ok for code like:
int i = 0, j = 0;
// cpu0:
i = 1;
atomic_thread_fence(memory_order_release);
j = 2;
// cpu1
int k = j;
atomic_thread_fence(memory_order_acquire);
if (k == 2) {
assert(i == 1);
}
While, due the c++ memory ordering, it does not. So, the question is what is the difference between atomic_thread_fence with smp_rmb/smp_wmb.
As a comparison, the code is totally Ok:
int i = 0, j = 0;
// cpu0:
i = 1;
smp_wmb();
j = 2;
// cpu1
int k = j;
smp_rmb();
if (k == 2) {
assert(i == 1);
}