Atomic read-compare-write instructions from multiple cores at the same time (on the same cache line) do contend with each other, but it's up to hardware to sort that out. Hardware arbitration of atomic RMW instructions is a real thing in modern CPUs, and provides some degree of fairness so that one thread spinning on lock cmpxchg
can't totally block other threads doing the same thing.
(Although that's a bad design unless your retry could succeed without waiting for another thread to modify anything, e.g. a retry loop that implements fetch_or
or similar can try again with the updated value of expected
. But if waiting for a lock or flag to change, after the initial CAS fails, it's better to spin on an acquire or relaxed load and only do the CAS if it might succeed.)
There's no guarantee what order they happen in, which is why you need to carefully design your algorithm so that correctness only depends on that compare-and-exchange being atomic. (The ABA problem is a common pitfall).
BTW, that entire block of pseudocode happens as a single atomic operation. Making a read-compare-write or read-modify-write happen as a single atomic operation is much harder for the hardware than just stores, which MESIF/MOESI handle just fine.
are you sure? I thought that it's unsafe to do that because, for example, x86 doesn't guarantee atomicity of writes for non-aligned DWORDs
lock cmpxchg
makes the operation atomic regardless of alignment. It's potentially a lot slower for unaligned, especially on cache-line splits where atomically modifying a single cache line isn't enough.
See also Atomicity on x86 where I explain what it means for an operation to be atomic.