There are two ways to implement an XCHG instruction.
a. using a hidden register. The 8085 has 2 hidden registers, but it is unknown if it used those registers for the xchange instruction.
The 8086 has not been reverse engineered as of yet, so we don't know how many hidden registers it has.
Temp = A
A = B
B = Temp
b. using the xor trick.
A = A xor B
B = A xor B
A = A xor B (Now A and B are swapped).
Note that both method A and B use 3 steps, so there is no way of telling using instruction timing which method is used.
Note that method A can be parallelized and method B cannot, but the 8086 does not do such fancy optimizations.
On modern CPU's an xchg
is consistently half as fast as a mov
and takes twice as many uops, hinting at the temp register being used, this can be done in 2 steps, because the first two assignments are fused into one using register renaming.
If the instruction was hardwired it could be done at the same speed as a mov
but this does not seem to be the case, presumably because it is rarely used.