I'm struggling with the exact semantics of the ARM STLR.
According to the documentation it has release semantics. So if you would have STLR store, you would get:
[StoreStore][LoadStore]
X=r1
Whereby X
is memory and r1
is some register.
The problem is that a release store and acquire load, fails to provide sequential consistency:
[StoreStore][LoadStore]
X=r1
r2=Y
[LoadLoad][LoadStore]
In the above case it is allowed that the X=r1 and r2=Y get reordered. To make this sequential consistent, a [StoreLoad] needs to be added:
[StoreStore][LoadStore]
X=r1
[StoreLoad]
r2=Y
[LoadLoad][LoadStore]
And you normally do this in the store because loads are more frequent.
On the X86 plain stores are release stores and plain loads are acquire loads. And the [StoreLoad] can be implemented by an MFENCE or using LOCK ADDL %(RSP),0
as is done in Hotspot JVM.
When looking at the ARM documentation, it seems that a LDAR has acquire semantics; so that would be [LoadLoad][LoadStore].
But the semantics of the STLR are vague. When I compile a C++ atomic using memory_order_seq_cst, there is just a STLR; there is no DMB. So it seems that the STLR has much stronger memory ordering guarantees than release store. To me it seems that on a fences level a STLR is equivalent to:
[StoreStore][LoadStore]
X=r1
[StoreLoad]
Could someone shed some light on this?