Purpose of CMPXCHG instruction without LOCK prefix?

Question

I am reading Vol 3a of the Intel Developer manuals:

http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-vol-3a-part-1-manual.html

and on page 245 it implies only XCHG instruction has automatic bus locking. Instructions such as XADD and CMPXCHG do not automatically lock the system bus and to do so you need to prefix with LOCK . To me, this suggests the instructions alone are not atomic across multiple CPU cores.

So what is the purpose of these instructions, if they aren't atomic?

Before I read the manual I expected these instructions would inherently be atomic at the CPU level. I thought that was the reason for combining the "compare" and "set" functions.

EDIT:

The reason could be to enforce atomicity on a single CPU core??

It supports the holy grail of lockless programming on a processor with a strong memory model. Covered pretty well by [this post](http://stackoverflow.com/a/3855824/17034). — Hans Passant, Aug 19 '14 at 14:24
Why does lockless programming not require the LOCK prefix? From your link it seems the instructions are atomic on that CPU, but just not in terms of multiple CPUs/cores? — user997112, Aug 19 '14 at 14:34
See my answer on the linked duplicate, it's answering this exact question. (Although I'm not sure the original question I posted it under intended to ask that. Oops :P) — Peter Cordes, Aug 09 '18 at 10:08

score 1 · Answer 1 · edited Jan 14 '20 at 13:30

1

A compiler could optimize a conditional set, e.g.

if (n == 42) { n = 2 };

into a CMPXCHG (without LOCK prefix), if that's actually faster.

Furthermore CMPXCHG is atomic on a uni-processor system, where you still need to synchronize against interrupts (which can fire between any two instructions, but not in the middle of one instruction on the core).

You can read me about the use-cases for CMPXCHG without LOCK prefix in Peter Cordes's answer.

edited Jan 14 '20 at 13:30

Peter Cordes

328,167
45
605
847

answered Jun 25 '15 at 12:26

Flow

23,572
15
99
156

How? This isn't a great example unless you actually show the assembly. – Evan Carroll Oct 20 '18 at 19:40
Interesting point. Usually branchless code ends up unconditionally writing and old or new value (e.g. load/cmov/store or SIMD load/blend/store). But I think `cmpxchg` without `lock` really is a conditional store, at least according to the manual. https://www.felixcloutier.com/x86/cmpxchg. So it doesn't "invent writes" by introducing a non-atomic read/rewrite in the compare-failed case. (If it did, that wouldn't be safe if another thread was writing `n` at the same time. The store could come after another thread's store and step on it.) – Peter Cordes Jan 14 '20 at 13:28

Purpose of CMPXCHG instruction without LOCK prefix?

1 Answers1