0

I am doing some experimentations on x64 assembly instructions, using the Miasm framework. Consider the snippet below, where I disassemble and reassemble the bytecode of LEA RAX, [RIP + 1]:

from miasm.analysis.machine import Machine

machine = Machine("x86_64").mn
ins = machine.dis(b"\x48\x8d\x05\x01\x00\x00\x00", 64)
print(ins)
>>> LEA        RAX, QWORD PTR [RIP + 0x1]

machine.asm(ins)
>>> [b'J\x8d\x05\x01\x00\x00\x00', b'K\x8d\x05\x01\x00\x00\x00', b'H\x8d\x05\x01\x00\x00\x00', b'I\x8d\x05\x01\x00\x00\x00', b'fH\x8d\x05\x01\x00\x00\x00', b'fI\x8d\x05\x01\x00\x00\x00', b'fK\x8d\x05\x01\x00\x00\x00', b'fJ\x8d\x05\x01\x00\x00\x00']

for i in machine.asm(ins):
    print(machine.dis(i, 64))
>>> LEA        RAX, QWORD PTR [RIP + 0x1]
>>> LEA        RAX, QWORD PTR [RIP + 0x1]
(...)
>>> LEA        RAX, QWORD PTR [RIP + 0x1]

My questions are: why exactly are there so many bytecodes that correspond to the same instruction, in which way do they differ? Is there any difference at all if I use one instead or another? My goal is to write a Python script to automate the generation of a rather complex assembly source file, so I'd like to double check that I won't have issue because I "choose" the wrong one.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Katoptriss
  • 107
  • 6
  • 1
    Your `'J'`, `'K'`, and so on are all REX prefixes with different bits set, bits which aren't used by that addressing mode. It would conventional to only set the W bit for 64-bit operand-size without any high registers, so a `0x48` byte. Other than prefixes, there can be differences in performance, for example: [Which Intel microarchitecture introduced the ADC reg,0 single-uop special case?](https://stackoverflow.com/q/51664369) / [How to force NASM to encode \[1 + rax\*2\] as disp32 + index\*2 instead of disp8 + base + index?](https://stackoverflow.com/q/48848230) – Peter Cordes Jun 23 '22 at 15:04
  • Another fun one: `xchg edx, ecx` has 2 cycles of latency in one direction, 1 cycle in the other: [Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures?](https://stackoverflow.com/q/45766444) – Peter Cordes Jun 23 '22 at 15:05
  • Terminology: this is *machine code*, not "bytecode". [Bytecode](https://en.wikipedia.org/wiki/Bytecode) is handled by a software interpreter, not by the hardware CPU directly, and does not encode actual machine instructions. – Nate Eldredge Jun 24 '22 at 06:04

1 Answers1

4

Refer to the Intel Software Development Manuals for details on the instruction encoding.

What you can observe here is that the instruction begins with a REX prefix to indicate that the data width is 64 bit. This REX prefix encodes 4 bits (the R, E, X, and W bits), but only the R bit (which must be clear to select RAX instead of R8) and the W bit (which must be set to select 64 bit operation instead of 32 bit operation) are relevant. The other two bits configure base and index register, but your memory operand doesn't have them.

So whatever you set these bits to, the result will come out to be the same. This is why four possible encodings are shown.

fuz
  • 88,405
  • 25
  • 200
  • 352