the effect of Locked instruction will serialize all operation on multi-processor system ?
from the following description, seems P6 and more recent cpu promise this rule:
Locked operations are atomic with respect to all other memory operations and all externally visible events. Only instruction fetch and page table accesses can pass locked instructions. Locked instructions can be used to synchro- nize data written by one processor and read by another processor.
For the P6 family processors, locked operations serialize all outstanding load and store operations (that is, wait for them to complete). This rule is also true for the Pentium 4 and Intel Xeon processors, with one exception. Load operations that reference weakly ordered memory types (such as the WC memory type) may not be serialized.
but another section, the LOCK on P6 and more recent processor, Cache-Locking will don't lock bus, so how it can promise serialize all operations ?
8.1.4 Effects of a LOCK Operation on Internal Processor Caches For the Intel486 and Pentium processors, the LOCK# signal is always asserted on the bus during a LOCK operation, even if the area of memory being locked is cached in the processor. For the P6 and more recent processor families, if the area of memory being locked during a LOCK operation is cached in the processor that is performing the LOCK operation as write-back memory and is completely contained in a cache line, the processor may not assert the LOCK# signal on the bus. Instead, it will modify the memory location internally and allow it’s cache coherency mechanism to ensure that the operation is carried out atomically. This operation is called “cache locking.” The cache coherency mechanism automatically prevents two or more processors that have cached the same area of memory from simultaneously modifying data in that area.
and here in memory order section, following sentence said Locked instruction have a total order:
In a multiple-processor system, the following ordering principles apply: • Individual processors use the same ordering principles as in a single-processor system. • Writes by a single processor are observed in the same order by all processors. • Writes from an individual processor are NOT ordered with respect to the writes from other processors. • Memory ordering obeys causality (memory ordering respects transitive visibility). • Any two stores are seen in a consistent order by processors other than those performing the stores • Locked instructions have a total order.
my confusion is, on P6 and modern cpu, the LOCKed instruction will serialize all load/store operation ?
in other words, Locked instruction has a total order ?
a example of three processor P1, P2, P3, P4 and memory location A, B:
P1 execute: Locked store to change A,
P2 execute: Locked store to change B,
if Locked store can't preserve total order, following will happen:
on P3, it can observe A changed --> B changed
on P4, it can observe B changed --> A changed