intel xed decoded instruction doesn't perfectly match the 8086 assembly code

Question

I am playing a little bit with xed with in mind the purpose to write a little emulator of the intel 8086 and I want to use xed as the decoder. But when I am writing a little code in asm (compiled with nasm):

[CPU 8086]

mov al, 0x7F
xor bx, bx
xchg bx, bx

cli
hlt

and try to display some things to see if understand how xed works, I have this behavior :

0x0:0x0 (0x0)
MOV : length = 2
operand0: AL (REG0)
operand1: 7f (IMM0)

0x0:0x2 (0x2)
XOR : length = 3
operand0: BX (REG0)
operand1: BX (REG1)
operand2: (REG2)

0x0:0x5 (0x5)
XCHG : length = 3
operand0: BX (REG0)
operand1: BX (REG1)

0x0:0x8 (0x8)
CLI : length = 1
operand0: EFLAGS (REG0)

0x0:0x9 (0x9)
HLT : length = 1

I don't understand why I have 3 operands for xor and 1 operand for cli, and in general, there is many cases where the operands displayed don't match the number of operands specified by intel. What am I doing wrong ?

There is the code I used in a gist (I did my best to make it as minimal as possible)

[edit]

Things are a little bit more clear now: I compiled xor bx, bx with nasm -f bin test.s and my program gives me that :

0x0:0x0 (0x0)
XOR : length = 2
operand0: BX (REG0)
operand1: BX (REG1)
operand2: FLAGS (REG2)

The length of xor is 2 : that's right we are in 16 bits mode. There is 2 explicit operands : bx and bx thats right There is one implicit suppressed operand : flags (like @Peter Corde said)

Everything looks good now

Peter Cordes · Accepted Answer · 2018-06-11T21:53:36.453

2

CLI clears the IF bit in EFLAGS, so that makes sense.

It looks like XED is including implicit operands, not just ones that are explicit in the machine code. i.e. all changes to the architectural state.

XOR writes flags, but XCHG doesn't. So REG2 is probably EFLAGS. But your code has only case XED_OPERAND_REG0 and ...REG1 in a switch statement, so probably it had a name (probably EFLAGS) but your code chose not to print it.

I was curious so I read the XED docs for you: XED classifies operands according to their visibility: either explicit (like bx in xor bx,bx) or implicit, or "IMPLICIT SUPPRESSED (SUPP)". SUPP operands are:

SUPP operands are:

not used in picking an encoding, (this is the difference from plain implicit)

not printed in disassembly,

not represented using operand bits in the encoding.

So you should check xed_operand_visibility_enum_t and only print the explicit operands.

BTW, you seem to have assembled your code in 32-bit or 64-bit mode, because your 16-bit instructions like xor bx,bx are 3 bytes long. In 16-bit mode it would just be opcode + modrm. An operand-size prefix (66) added by the assembler (and correctly decoded by the disassembler) would explain it.

[CPU 8086] doesn't mean [BITS 16]. Unless you really want 16-bit mode for some reason, you should probably keep using 32-bit mode. (Your disassembler was already decoding it in the same mode your assembler was assembling for. Using BITS 16 would let you put 16-bit machine code in a 32-bit object file, which would just make it decode wrong.

edited Jun 11 '18 at 21:53

answered Jun 11 '18 at 20:55

Peter Cordes

328,167
45
605
847

I have the same result when in my asm file I use [BIT 32] instead of [CPU 8086] – Adrien Jun 11 '18 at 21:04
@Adrien: Of course you do. `xor bx,bx` requires a `66` prefix in 32-bit mode, but *not* in `BITS 16`. I guess with your assembler, setting `[CPU 8086]` doesn't set `[BITS 16]`. Anyway, you probably don't want `BITS 16` unless your object-file format can represent that. Putting 16-bit machine code in a 32-bit object file will just make it decode wrong. (e.g. as `xor ebx,ebx`) – Peter Cordes Jun 11 '18 at 21:08
1

@Adrien Regarding XOR, the EFALGS register corresponds to `XED_OPERAND_REG2`, which your code does not print in the switch statement. So you end up only printing the `(REG2)` part. – Hadi Brais Jun 11 '18 at 21:17
@PeterCordes I am creating a raw binary file format. This project is for a presentation at school so maybe I should try something easier because for what I understand my problem is that I try to encode instruction for the intel 8086 with the rules for the 386+ ? – Adrien Jun 11 '18 at 21:36
@HadiBrais I know, but the problem is that xor should only have 2 operands, not three – Adrien Jun 11 '18 at 21:37
@Adrien: What you have now appears to be working perfectly, so you should just do that. You didn't say *how* you're using NASM. That's odd, the default for flat binary is `BITS 16`, so `nasm -fbin foo.asm` should give you only 2 byte `xor bx,bx`. But XED is decoding it as 32-bit. [`[CPU 8086]` tells the NASM to not allow any instructions that weren't supported by 8086](https://www.nasm.us/doc/nasmdoc6.html#section-6.8) (like `movzx`), but it's separate from the `BITS` mode. It's *just* a filter on instructions, not modes, intended to help you avoid mistakes when writing portable code. – Peter Cordes Jun 11 '18 at 21:43
@Adrien: If you use `nasm -fbin -l/dev/stdout foo.asm`, you'll get a listing file where NASM shows you the machine code. Or use `ndisasm` on your flat binary. That will let you double-check instruction lengths to see how NASM encoded them. – Peter Cordes Jun 11 '18 at 21:44
@PeterCordes I used `nasm -f bin test.s` and ndisasm give me code that really looks like 16 bits mode I think you were right, maybe xed notify me when and includes implicit operands (edit: I juste read what you added to your post) – Adrien Jun 11 '18 at 22:04
@Adrien: Use `.asm` for NASM, and `.S` only for GAS sources. So is XED not showing correct instruction lengths / addresses? Are you sure it fully supports 16-bit mode? – Peter Cordes Jun 11 '18 at 22:12

intel xed decoded instruction doesn't perfectly match the 8086 assembly code

1 Answers1