Is there an atomic CAS instruction or equivalent in the AVX512 set?
I can't immediately find one but don't have the best google fu.
Is there an atomic CAS instruction or equivalent in the AVX512 set?
I can't immediately find one but don't have the best google fu.
Other than lock cmpxchg16b
(16-bytes), x86 doesn't have any guaranteed-atomic operations wider than 8 bytes. Aligned vector load / store are elementwise-atomic on current CPUs (i.e. no tearing within an 8-byte element), although it's not clear if the documentation guarantees that.
Were you hoping for a 64-byte whole-cache-line CAS? There's no single instruction for that.
AVX512 alone doesn't provide that, but with TSX (transactional memory) you can roll your own. Put a load + compare + store inside a transaction. IDK how expensive xbegin
/ xend
is compared to lock cmpxchg
.
You don't need AVX512 for it either; the whole transaction commits atomically or not at all, so you could use a pair of AVX2 load / compare instructions to implement a 64-byte CAS.