I was looking at the compiler output of rmw atomics from gcc and noticed something odd - on Aarch64, rmw operations such as fetch_add can be partially reordered with relaxed loads.
On Aarch64, the following code may be generated for value.fetch_add(1, seq_cst)
.L1:
ldaxr x1, [x0]
add x1, x1, 1
stlxr w2, x1, [x0]
cbnz L1
However, it's possible for loads and stores that happen prior to ldaxr to be reordered past the load and loads/stores that happen after the stlxr (see here). GCC doesn't add fences to prevent this - Here's a small piece of code demonstrating this:
void partial_reorder(std::atomic<uint64_t> loader, std::atomic<uint64_t> adder) {
loader.load(std::memory_order_relaxed); // can be reordered past the ldaxr
adder.fetch_add(1, std::memory_order_seq_cst);
loader.load(std::memory_order_relaxed); // can be reordered past the stlxr
}
generating
partial_reorder(std::atomic<int>, std::atomic<int>):
ldr w2, [x0] @ reordered down
.L2:
ldaxr w2, [x1]
add w2, w2, 1
stlxr w3, w2, [x1]
cbnz w3, .L2
ldr w0, [x0] @ reordered up
ret
In effect, the loads can be partially reordered with the RMW operation - they occur in the middle of it.
So, what's the big deal? What am I asking?
It seems strange that an atomic operation is divisible as such. I couldn't find anything in the standard preventing this, but I had believed that there was a combination of rules that implied operations are indivisible.
It seems like this doesn't respect acquire ordering. If I perform a load directly after this operation, I could see store-load or store-store reordering between the fetch_add and the later operation, meaning that the later memory access is at least partially reordered behind the acquire operation. Again, I couldn't find anything in the standards explicitly saying that isn't allowed and acquire is load ordering, but my understanding was that the acquire operation applied to the entirety of the operation and not just parts of it. A similar scenario can apply to release where something is reordered past the ldaxr.
This one is may be stretching the ordering definitions a bit more, but it seems invalid that two operations before and after a seq_cst operation can be reordered past each other. This could(?) happen if the bordering operations each reorder into the middle of the operation, and then go past each other.