Besides what the other correct answers say, another part of your premise is wrong.
Only a really dumb compiler would want to actually emit xchg
every time the source swapped variables, whether there's an intrinsic or operator for it or not. Optimizing compilers don't just transliterate C into asm, they typically convert to an SSA internal representation of the program logic, and optimize that so they can implement it with as few instructions as possible (or really in the most efficient way possible; using multiple fast instructions can be better than a single slower one).
xchg
is rarely faster than 3 mov
instructions, and a good compiler can simply change its local-variable <-> CPU register mapping without emitting any asm instructions in many cases. (Or inside a loop, unrolling can often optimize away swapping.) Often you need only 1 or mov
instructions in asm, not all 3. e.g. if only one of the C vars being swapped needs to stay in the same register, you can do:
# start: x in EAX, y in ECX
mov edx, eax
mov eax, ecx
# end: y in EAX, x in EDX
See also Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures?
Also note that xchg [mem], reg
is atomic (implicit lock
prefix), and thus is a full memory barrier, and much slower than 3 mov
instructions, and with much higher impact on surrounding code because of the memory-barrier effect.
If you do actually need to exchange registers, 3x mov
is pretty good. Often better than xchg reg,reg
because of mov
elimination, at the cost of more code-size and a tmp reg.
There's a reason compilers never use xchg
. If xchg
was a win, compilers would look for it as a peephole optimization the same way they look for inc eax
over add eax,1
, or xor eax,eax
instead of mov eax,0
. But they don't.
(semi-related: swapping 2 registers in 8086 assembly language(16 bits))