lets say in x86-64 arch, there are 2 cores, each core has a thread doing such thing: compare-and-swap a shared value(test the shared value if it's 0, change to 1), and then doing something else, after that, set the value to 0 again(in Loop), quite like a simple spinlock. I have a problem with that, if core-1 set the value to 1, core-2 is wait-busy(test the value), and then core-1 set the value to 0, cpu may doing such thing in timeline(when core-1 set val to 0):
time 0: core-1 set the new value to store buffer, and send "read invalidate" message to core-2
time 1: core-2 got msg and save it to invalidate queue, send ACK to core-1
time 2: core-1 got ACK flush store buffer
time 1.5 or 2.5 : core 2 flush invalidate queue
so if in time 0.5, core-1 read the value again, so it can get the newer data, but core 2 still got the dirty data, this is my guess, so will it happen just like this? if "yes", how to fix the problem? I don't think memory-barrier or LOCK bus may get anything help, additionally, does c++11 std::atomic value has such problem ?