lock cmpxchg
is atomic; nothing can happen during its execution (logically anyway). Unlike an LL/SC implementation of compare_exchange_weak
, it can't spuriously fail when another thread is writing inside the same cache line, only if the compare actually fails. (Can CAS fail for all threads?)
compare_exchange_strong
can be implemented as lock cmpxchg
without requiring a loop.
Are you asking about the use-case where you load the old value, modify it, and then try to CAS a new value into place? (e.g. to synthesize an atomic operation that the hardware doesn't provide directly, e.g. atomic FP add, or atomic clear-lowest-bit).
In that case yes, the reason for cmpxchg
failure would be a race with another thread.
As @ネロク explained, you check the flag result of cmpxchg
to get the boolean result of compare_exchange_strong
, i.e. to tell whether the compare part failed and whether the store was done or not. See Intel's manual entry for cmpxchg
.
There are other use-cases for lock cmpxchg
too, like waiting for a spinlock to become available by spinning on lock.compare_exchange_weak(0, 1)
, repeatedly trying to CAS an unlocked variable into a locked state. So a failed CAS isn't from a race condition, it's simply from the lock still being held. (Because you're passing a constant as the "expected", not what you just read from that location.)
This is generally not a good idea, AFAIK. I think it's normally better to spin read-only waiting for a lock, and only try to take the lock with a CAS or an XCHG if you saw it was available. (Using xchg
and testing the integer result is potentially cheaper than lock cmpxchg
, possibly keeping the cache line locked for fewer cycles. Locks around memory manipulation via inline assembly)
TL:DR: lockless programming requires precision language if you're going to reason about correctness.
cmpxchg
fails and clears ZF (creating the ne
not-equal condition) if EAX doesn't match the value in memory when it executes. The instruction itself doesn't know or care about races, just the exact state when it itself executes.
You as a programmer have to worry about races and retrying when using it as a building block to create larger atomic operations built on top of it.