What is the difference in logic and performance between x86-instructions LOCK XCHG
and MOV+MFENCE
for doing a sequential-consistency store.
(We ignore the load result of the XCHG
; compilers other than gcc use it for the store + memory barrier effect.)
Is it true, that for sequential consistency, during the execution of an atomic operation: LOCK XCHG
locks only a single cache-line, and vice versa MOV+MFENCE
locks whole cache-L3(LLC)?