2

The book I'm reading on 8086 assembly states that

XCHG AX, VAR

is equivalent to:

MOV DX, AX        ; DX is a temporary register
MOV AX, VAR
MOV VAR, DX

Is it really using a data register, such as DX, and internally executing the equivalent of three move instructions or is it doing something else in a 8086? If the former case is true what happens to the data register content?

Alessandro M.
  • 123
  • 1
  • 7
  • 2
    Suspect you can't get a thorough answer without seeing the processor designs, but I can certainly say that the `xchg` instruction doesn't actually trash any (user-visible) third register. I also suggest you read up on _register renaming_, as that may give a clue to what really happens "under the hood". – davmac Jan 31 '17 at 12:45
  • Are you certain about that first line? It should read `MOV DX, AX`. – David Hoelzer Jan 31 '17 at 12:50
  • 1
    No, it isn't really using a data register. – David Hoelzer Jan 31 '17 at 12:50
  • @DavidHoelzer Sorry, I was writing from my mobile and I did a (bad) typo – Alessandro M. Jan 31 '17 at 12:52
  • 11
    Even "way back when", before register renaming, there were internal "nameless" registers that would be used for operations that need a temporary place to put something. Now I don't know much about the 8086 specifically, but [here](http://www.righto.com/2013/03/register-file-8085.html) you can find a picture that includes the 8085's internal WZ registers, it stands to reason that its successor used some of the same tricks, even though it is of course hugely different. – harold Jan 31 '17 at 13:05
  • 3
    @harold I couldn't imagine a better answer to this question, unless someone had actually reverse-engineered an 8086. And even if they did and eventually came along to post that, your answer would still be useful in the mean time. So unless you think this question should be closed (and I don't see a vote to that effect), please consider promoting your comment to an answer. – Cody Gray - on strike Feb 01 '17 at 08:20

2 Answers2

3

There are two ways to implement an XCHG instruction.

a. using a hidden register. The 8085 has 2 hidden registers, but it is unknown if it used those registers for the xchange instruction. The 8086 has not been reverse engineered as of yet, so we don't know how many hidden registers it has.

Temp = A
A = B
B = Temp    

b. using the xor trick.

A = A xor B
B = A xor B
A = A xor B  (Now A and B are swapped).  

Note that both method A and B use 3 steps, so there is no way of telling using instruction timing which method is used.

Note that method A can be parallelized and method B cannot, but the 8086 does not do such fancy optimizations.

On modern CPU's an xchg is consistently half as fast as a mov and takes twice as many uops, hinting at the temp register being used, this can be done in 2 steps, because the first two assignments are fused into one using register renaming.

If the instruction was hardwired it could be done at the same speed as a mov but this does not seem to be the case, presumably because it is rarely used.

Johan
  • 74,508
  • 24
  • 191
  • 319
  • or just impementing it in logic...which is the primary solution. – old_timer Feb 02 '17 at 13:41
  • 1
    I think you're only answering re: `xchg reg,reg`. The OP is talking about `xchg reg, var`, i.e. with memory, which means it's done atomically (implies a `lock` prefix). On original 8086 (no cache) this definitely means one read and one write. (And the OP's sequence isn't equivalent wrt. interrupts.) – Peter Cordes Oct 06 '18 at 13:04
  • For `xchg reg,reg`: Intel decodes it to 3 uops, none of which are handled with mov-elimination. It has 2 cycle latency for one way, and 1 cycle latency for the other. [Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures?](https://stackoverflow.com/q/45766444). AMD has 2-uop `xchg` since Bulldozer-family, and actually has zero latency on Ryzen. Ryzen has 1 uop `fxchg`, but Intel has relaxed that to 2 now that x87 is obsolete. – Peter Cordes Oct 06 '18 at 13:08
-1

3 to 4 clocks on a non pipelined processor, there are two reads and two writes, so could maybe parallel one.

read register
read external
swap (logic, route signals, no time for xor nor extra register stuff) same clock cycle as one of the reads.
write both if one is external, if both are registers then an additional clock.

So that makes up 3 to 4. If there was a temp register or some xors it would be another couple/three clocks.

old_timer
  • 69,149
  • 8
  • 89
  • 168
  • xchg takes 3 to 4 clock cycles. A mov takes 2 on the 8086. Ergo it's unlikely that special hardware was used for xchg, or the instruction would be faster. Even on a modern CPU xchg is much slower than a MOV. – Johan Feb 03 '17 at 07:11