This question is a follow-up/clarification to this:
Does the MOV x86 instruction implement a C++11 memory_order_release atomic store?
This states the MOV
assembly instruction is sufficient to perform acquire-release semantics on x86. We do not need LOCK
, fences or xchg
etc. However, I am struggling to understand how this works.
Intel doc Vol 3A Chapter 8 states:
https://software.intel.com/sites/default/files/managed/7c/f1/253668-sdm-vol-3a.pdf
In a single-processor (core) system....
- Reads are not reordered with other reads.
- Writes are not reordered with older reads.
- Writes to memory are not reordered with other writes, with the following exceptions:
but this is for a single core. The multi-core section does not seem to mention how loads are enforced:
In a multiple-processor system, the following ordering principles apply:
- Individual processors use the same ordering principles as in a single-processor system.
- Writes by a single processor are observed in the same order by all processors.
- Writes from an individual processor are NOT ordered with respect to the writes from other processors.
- Memory ordering obeys causality (memory ordering respects transitive visibility).
- Any two stores are seen in a consistent order by processors other than those performing the stores
- Locked instructions have a total order.
So how can MOV
alone can facilitate acquire-release?