For 1) and 2) no, some other thread that loads foo
won't sync-with foo.exchange(acquire)
in another thread, because it's only an acquire, not a release operation. So that other thread won't safely be able to read the values of non-atomic assignments from before the exchange, or get guaranteed values for earlier atomic stores.
The 3) and 4) have various problems in terms of (not) syncing with another writer or reader to create a happens-before relationship. That only happens when one thread does an acquire-load on the value from a release-store in another thread. If the store side of the exchange is relaxed
, that doesn't happen.
IDK if you're thinking of dummy.store(0, std::memory_order_release);
as being a 2-way barrier like atomic_thread_fence(release)
but it's not, it's just a release operation, on a dummy variable that no other thread ever accesses (I assume.)
See https://preshing.com/20120913/acquire-and-release-semantics/ for a description in terms of local reordering of accesses to coherent shared memory. Acquire and release operations can reorder in one direction each. The dummy release store can reorder with any later operations except ones that are themselves release
or stronger, so it might as well not exist.
What would be approximately equivalent (strictly stronger I think) is:
// Any earlier operations can't reorder past the fence
std::atomic_thread_fence(std::memory_order_release);
// and later stores can't reorder before the fence
foo.exchange(bar, std::memory_order_acquire); // so this store is after any earlier ops
The load part of the exchange can still reorder with earlier loads/stores on other objects so it's not much stronger. (related: For purposes of ordering, is atomic read-modify-write one operation or two?)
Also fine would be foo.exchange(bar, release)
; thread_fence(acquire)
.
Another answer suggests foo.exchange(bar, release)
; foo.load(acquire)
would be equivalent, but it's not. The acquire load might sync-with a different thread than the one whose value the exchange saw.
If you're really not using the return value of exchange
to either check if you should do something (if(sequence_num > x)
), or figure out what or where you should access (e.g. a pointer or array index), the acquire semantics of it is unlikely to matter at all.
But if we consider a reader like int idx = foo.exchange(bar, acq_rel);
int tmp = arr[idx];
, replacing the acq_rel
exchange with int idx = foo.exchange(bar, release)
; foo.load(acquire)
(ignoring the value of that acquire load) wouldn't be equivalent. Only an acquire barrier (fence) would order the load side of the exchange wrt. later operations.
If a store from a third thread becomes visible between the exchange(release)
and load(acquire)
, you don't sync-with the thread that stored the value your exchange
saw, only the third thread that stored the value you're ignoring.
Consider a writer that did arr[i] = 123;
foo.store(i, release);
If a third thread did foo.store(0, relaxed);
or whatever, the foo.load(acquire)
would sync with it, not the one that wrote arr[idx]
. This is of course a contrived example, and dependency ordering would save you on real CPUs even though the load side of foo.exchange
was relaxed
not consume
. But ISO C++ formally guarantees nothing in that case. (And branching on the exchange
result instead of using it as part of a load or maybe store address wouldn't let dependency ordering save you.)
If the third thread was also using exchange
(even relaxed), that would create a release-sequence so your load would still sync-with the earlier writer as well. But a pure store doesn't guarantee that, breaking a release-sequence.
On most CPUs, where stores can only become visible to other threads by committing to coherent cache, the writer had to wait for exclusive ownership of the cache line just like for an atomic RMW. So plain stores can also continue a release-sequence, letting an acquire load sync-with all previous release stores and RMWs to the object. But ISO C++ doesn't formally guarantee that, and I wouldn't bet on it being safe on PowerPC where store-forwarding between logical cores is a thing. Except that on PPC, an acquire load is done with asm barriers, which would also strengthen the load part of an exchange.
Still, if you're trying to understand the C++ formalism, it's important to understand that the load who's value you actually use needs to be acquire
, or there needs to be an acquire fence (not just operation).