2

I was originally trying to generate the bytes for an immediate move into a 64 bit register.The specific operation I wanted was

mov rdi, 0x1337

Using https://www.felixcloutier.com/x86/mov, the only non-sign extended instructions I saw was

REX.W + B8+ rd io

This confused me so I created a small assembly program to see what the assembler would generate

          global    _start

          section   .text
_start:   
          mov       rdi, 0x1337 
          syscall                           
          mov       rax, 60                 
          xor       rdi, rdi                
          syscall                           

I had to turn off optimizations so that there would be a move into a 64-bit register. So I compiled with nasm -felf64 -O0 main.asm && ld main.o and generated a a.out. I look at the objdump -M intel -d ./a.out and this line

48 bf 37 13 00 00 00    movabs rdi,0x1337  

That line looks nothing like

REX.W + B8+ rd io

to me? Additionally, after some research, I saw that the command is suppose to be 10 bytes. How do you get that from REX.W + B8+ rd io?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Happy Jerry
  • 164
  • 1
  • 8
  • BTW why not `mov edi, 0x1337`? Same effect, smaller encoding. – harold Apr 06 '22 at 22:36
  • @harold I ended up doing that, but I'm just confused about the instruction encoding. `REX.W + B8+ rd io` looks very foreign to me, and I'd like to understand that. – Happy Jerry Apr 06 '22 at 22:57
  • 1
    You left out `objdump -d -w` to not line-wrap long instructions, like 10-byte `mov r64, imm64`. See also [Difference between movq and movabsq in x86-64](https://stackoverflow.com/q/40315803). (BTW, you can get that without disabling NASM optimization with `mov rdi, strict qword 0x1337`. See [Why NASM on Linux changes registers in x86\_64 assembly](https://stackoverflow.com/a/48597025) ). Both of those answers are relevant to understanding the available forms of MOV and how it works in machine code. – Peter Cordes Apr 07 '22 at 00:16

1 Answers1

5

B8+ rd means the operand (a register) is encoded in the low 3 bits of the opcode, not in a ModR/M byte.

From the Intel Software Developer's Manual,

+rb, +rw, +rd, +ro — Indicated the lower 3 bits of the opcode byte is used to encode the register operand without a modR/M byte. The instruction lists the corresponding hexadecimal value of the opcode byte with low 3 bits as 000b. In non-64-bit mode, a register code, from 0 through 7, is added to the hexadecimal value of the opcode byte. In 64-bit mode, indicates the four bit field of REX.b and opcode[2:0] field encodes the register operand of the instruction. “+ro” is applicable only in 64-bit mode.

It looks like Intel wanted to use +ro for 64-bit operands encoded in that way, but then didn't actually do that. Not just in the mov lemma, but anywhere, as far as I could find. For example 64-bit push and pop could have had + ro, but they also have + rd. And "Indicated" is likely a typo, the rest of the text uses the present tense.

The (e/r)di register is number 7, and B8 + 7 = BF, explaining the opcode.

io stands for a qword immediate (o for octo, as in 8 bytes, perhaps?).

The REX prefix (40 for the base prefix, +8 to set the W bit, optionally +1 to set the B bit to access R8..R15), the opcode, no ModR/M byte, and the 8-byte immediate, add up to 10 bytes.

harold
  • 61,398
  • 6
  • 86
  • 164
  • Where did the "40 for the base prefix, +8 to set the W bit" come from? How did you know to set the W bit and add 8 to it? – Happy Jerry Apr 06 '22 at 23:33
  • 1
    @HappyJerry `REX.W` in the encoding column indicates that the W bit needs to be set (makes sense: otherwise it would be a 32-bit `mov`), and it's the bit that corresponds to 8. REX with none of its flags set is 40, REX with the W bit set is the 48 which you saw. – harold Apr 06 '22 at 23:38
  • So REX is used to indicate the the registers. And the W bit indicates whether you're using a 32 bit or 64 bit register? If I were using the Register R9, the B flag would be set. And the first byte wold be 49. As for the ModR/M byte, this is used when a memory operand is used. So does this mean that any r/m64 or r/m32 or r/m16 form is used, there is a potentail the ModR/M byte is used. If the r/mX is a memory location or offset – Happy Jerry Apr 07 '22 at 00:04
  • @HappyJerry ModR/M is also commonly used to encode register operands, instructions with a `+rd` encoding are a minority by far. Most instructions require a modR/M byte. – harold Apr 07 '22 at 00:18
  • @HappyJerry: See [What is REX prefix in Instruction Encoding?](https://stackoverflow.com/q/68604377) and [How to read the Intel Opcode notation](https://stackoverflow.com/q/15017659) – Peter Cordes Apr 07 '22 at 00:23
  • @harold: Huh, I'd never noticed that usage of `o` for oct-*byte*. Elsewhere, e.g. the [`cqo` mnemonic](https://www.felixcloutier.com/x86/cwd:cdq:cqo), Intel uses `o` for 16-byte oct-*word*. – Peter Cordes Apr 07 '22 at 00:25
  • @PeterCordes Thanks. The only things I can't see on those links is what `/0` means. For example `ADD rm/32 imm32` is `81 /0 id` – Happy Jerry Apr 07 '22 at 01:14
  • 1
    @HappyJerry: See my answer on [How to read the Intel Opcode notation](https://stackoverflow.com/a/53976236) for the `/0` part, using ModRM.r as extra opcode bits. – Peter Cordes Apr 07 '22 at 01:23
  • @PeterCordes So the /0 is the reg field in the ModR/M byte? In other wordsm the forward slash could be anything from /0 to /7 – Happy Jerry Apr 07 '22 at 04:54