This question is based on Can't relaxed atomic fetch_add reorder with later loads on x86, like store can? I agree with answer given. On x86 00 will never occur because a.fetch_add has a lock prefix/full barrier and loads can't reorder above fetch_add but on other architectures like arm/mips it can print 00. I have a two followup question about store buffer on x86 and arm.
I never get 11 on my pc (core i3 x86_64) i.e is 11 a valid output on x86 in iso c++ , so am i missing something ? @Daniel Langr demonstrated 11 is a valid output on x86.
Now x86_64 has an advantage fetch_add acting as a full barrier.
For arm64 , output can be 00 sometimes due to cpu instruction reordering.
For arm64 or some other arch, can the output be 00 if without reordering ?. My question is based on this. The store buffer values for function foo a.fetch_add(1) is not visible to bar's a.load() and b.fetch_add(1) is not visible to foo's b.load(). Hence we get 00 without reordering. Can this happen under ISO C++ on different archs ?
// g++ -O2 -pthread axbx.cpp ; while [ true ]; do ./a.out | grep "00" ; done
#include<cstdio>
#include<thread>
#include<atomic>
using namespace std;
atomic<int> a,b;
int reta,retb;
void foo(){
a.fetch_add(1,memory_order_relaxed); //add to a is stored in store buffer of cpu0
//a.store(1,memory_order_relaxed);
retb=b.load(memory_order_relaxed);
}
void bar(){
b.fetch_add(1,memory_order_relaxed); //add to b is stored in store buffer of cpu1
//b.store(1,memory_order_relaxed);
reta=a.load(memory_order_relaxed);
}
int main(){
thread t[2]{ thread(foo),thread(bar) };
t[0].join(); t[1].join();
printf("%d%d\n",reta,retb);
return 0;
}