No amount of barriers can help you avoid data-race UB if you begin another write of the non-atomic variables right after the release-store.
It will always be possible (and likely) for some non-atomic writes to a
,b
, and c
to be "happening" while your reader is reading those variables, therefore in the C abstract machine you have data-race UB. (In your example, from unsynced write+read of a
, unsynced write+write of b
, and the write+read of b
, and write+write of c
.)
Also, even without loops, your example would still not safely avoid data-race UB, because your TaskB accesses a
,b
, and c
unconditionally after the flag.load
. So you do that stuff whether or not you observe the data_ready = 1 signal from the writer saying that the vars are ready to be read.
Of course in practice on real implementations, repeatedly writing the same data is unlikely to cause problems here, except that the value read for b
will depend on how the compiler optimizes. But that's because your example also writes.
Mainstream CPUs don't have hardware race detection, so it won't actually fault or something, and if you did actually wait for flag==1
and then just read, you would see the expected values even if the writer was running more assignments of the same values. (A DeathStation 9000 could implement those assignments by storing something else in that space temporarily so the bytes in memory are actually changing, not stable copies of the values before the first release-store, but that's not something that you'd expect a real compiler to do. I wouldn't bet on it, though, and this seems like an anti-pattern).
This is why lock-free queues use multiple array elements, or why a seqlock doesn't work this way. (A seqlock can't be implemented both safely and efficiently in ISO C++ because it relies on reading maybe-torn data and then detecting tearing; if you use narrow-enough relaxed atomics for the chunks of data, you're hurting efficiency.)
The whole idea of wanting to write again, maybe before a reader has finished reading, sounds a lot like you should be looking into the idea of a SeqLock. https://en.wikipedia.org/wiki/Seqlock and see the other links in my linked answer in the last paragraph.