Matching the intel codes to disassembly output

Question

I'm starting to use the Intel reference page to look up and learn about the op codes (instead of asking everything on SO). I'd like to make sure that my understanding is OK and ask a few questions on the output between a basic asm program and the intel instruction codes.

Here is the program I have to compare various mov instructions into the rax-ish register (is there a better way to say "rax" and its 32- 16- and 8- bit components?):

.globl _start
_start:
    movq $1,    %rax    # move immediate into 8-byte rax (rax)
    movl $1,    %eax    # move immediate into 4-byte rax (eax)
    movw $1,    %ax     # move immediate into 2-byte rax (ax)
    movb $1,    %al     # move immediate into 1-byte rax (al)
    mov $60,    %eax
    syscall

And it disassembles as follows:

$ objdump -D file

file:     file format elf64-x86-64


Disassembly of section .text:

0000000000400078 <_start>:

  400078:   48 c7 c0 01 00 00 00    mov    $0x1,%rax
  40007f:   b8 01 00 00 00          mov    $0x1,%eax
  400084:   66 b8 01 00             mov    $0x1,%ax
  400088:   b0 01                   mov    $0x1,%al

  40008a:   b8 3c 00 00 00          mov    $0x3c,%eax
  40008f:   0f 05                   syscall

Now, matching up to the intel codes from MOV, copied here:

I am able to reconcile the following of the four instructions:

mov $0x1,%al --> b0 01
YES, intel states code is b0 [+ 1 byte for value] for 1-byte move immediate.
mov $0x1,%eax --> b8 01 00 00 00
YES, intel states code is b8 [+ 4 bytes for value] for 1-byte move immediate.
mov $0x1,%ax --> 66 b8 01 00
NO, intel states code is b8 not 66 b8.
mov $0x1,%rax48 --> c7 c0 01 00 00 00
N/A, 32 bit instructions only. Not listed.

From this, my question related to this are:

Why doesn't the mov $0x1,%ax match up?
Is there the same table for 64-bit codes, or what's the suggested way to look that up?
Finally, how do the codes adjust when the register changes? For example, if I want to move a value to %ebx or %r11 instead. How do you calculate the 'code-adjustment', as it looks like in this lookup table it only gives (I think?) the eax register for the 'register example codes'.

The entries leave it up to you to use an operand-size prefix (`66`) or not to get 16-bit operand size in 32 and 64-bit mode, or 32-bit operand-size in 16-bit mode. `66` is a prefix not an opcode. `48` is a REX prefix with W=1. Again, look up the opcode after the prefix. — Peter Cordes, Sep 13 '20 at 20:09
Also, for 64-bit, stop looking at an ancient version of Intel's manual. https://www.felixcloutier.com/x86/mov is scraped from a current version of Intel's PDF. Or since you need the extra info on understanding the manual entries, just read Intel's actual PDF, including the intro chapters in vol.2. Links to that and more in https://stackoverflow.com/tags/x86/info — Peter Cordes, Sep 13 '20 at 20:29
@PeterCordes oh that's so much better thanks for linking that (also it supports searching within the html, the other version didn't). — carl.hiass, Sep 13 '20 at 20:30
Also related: [How many ways to set a register to zero?](https://stackoverflow.com/a/32673696) mentions 3 different forms of `mov` to a 32 or 64-bit register. — Peter Cordes, Sep 13 '20 at 20:30

Chris Dodd · Accepted Answer · 2020-09-13T20:48:28.150

3

You're missing the (concept of) prefix "opcodes" that change the meaning of the following instruction. Volume 2, sections 2.1.1 and 2.2.1 of the IA32 manual covers this. From 2.1.1 we get:

Operand-size override prefix is encoded using 66H (66H is also used as a mandatory prefix for some instructions).

so the 66 prefix changes the operand size from the default 32-bit to 16-bit. Thus, the mov $1,%ax (16-bit) is the same as mov $1,%eax (32-bit) with just the 66 prefix

The last case (mov $1, %rax) is actually using a different instruction

REX.W + C7 /0 io    MOV r/m64, imm32      Move imm32 sign extended to 64-bits tor/m64.

here we're moving a constant into any register instead of A -- the instruction is one byte larger but allows moving a 32-bit immed into a 64-bit register, so only needs a 4-byte constant instead of an 8-byte one (so ends up being 3 bytes smaller than the equivalent 48 b8 01 00 00 00 00 00 00 00)

edited Sep 13 '20 at 20:48

answered Sep 13 '20 at 20:22

Chris Dodd

119,907
13
134
226

thanks, could you please link to the part that says that so I can read a bit more about it? What are the available prefixes in addition to `66` ? – carl.hiass Sep 13 '20 at 20:29
what does `rex.w` mean? As far as I can tell, it seems to be ignored (so why would that be listed?) – carl.hiass Sep 13 '20 at 20:36
@carl.hiass `rex.w` changes the operand size to 64 bit. – fuz Sep 13 '20 at 20:39
REX.W is the 48 prefix -- sets the operand size to 64-bits instead of 32 – Chris Dodd Sep 13 '20 at 20:41
@fuz are the prefixes for reverse compatibility? `REX.W, REX.R, REX.X, and REX.B` are those 64, 32, 16, and 8 bit? – carl.hiass Sep 13 '20 at 20:41
2

@carl.hiass No. The REX prefixes exist in 64 bit mode only. `W` changes the operand size, `R`, `X`, and `B` provide an extra bit for the register numbers of **r**egister operand, **b**ase, and inde**x**, i.e. they turn the meaning of the register numbers 0–7 from rax, rcx, rdx, rbx, rsp, rbp, rsi, and rdi into r8, r9, r10, r11, r12, r13, r14, and r15 when set. – fuz Sep 13 '20 at 20:43
No -- the (default) operand size is 32-bit and the operand size (66) prefix makes it 16 while REX.W(48) makes it 64. 8-bit operands are different instructions, so need no prefix. See vol 2 section 2.2 of [the manual](https://software.intel.com/content/www/us/en/develop/download/intel-64-and-ia-32-architectures-sdm-combined-volumes-1-2a-2b-2c-2d-3a-3b-3c-3d-and-4.html) for more details. – Chris Dodd Sep 13 '20 at 20:45
@carl.hiass As the answer says, 16 bit operand size is selected with a `66` prefix and 8 bit operand size is selected by the opcode. In SSE instructions, the prefixes `66`, `f2`, and `f3` instead do something else. – fuz Sep 13 '20 at 20:45
Refer to the *Intel Software Development Manuals* for details. There's a whole chapter on how the encoding works. It's a lot more productive to read the manual than to gather bits and pieces from random Stack Overflow questions. – fuz Sep 13 '20 at 20:46
@fuz agreed...but it's 5000 pages! Where should I start? – carl.hiass Sep 13 '20 at 20:48
1

@carl.hiass Check the table of contents. There's a chapter on the instruction format. Should be at the beginning of volume 2. Start there. It might be helpful to download a version of the manual split into volumes since a PDF file as large as the combined volumes file can overwhelm your PDF viewer. – fuz Sep 13 '20 at 20:48
@carl.hiass: https://wiki.osdev.org/X86-64_Instruction_Encoding is pretty good. If you don't know where to start in Intel's manuals, text search in a PDF viewer for something you want to know more about, like REX prefixes, and you'll find the section about it. And yeah, get the 3-volume PDFs so you can get vol.2 as its own PDF, not one huge PDF that's slow to search. A lot of that is OS development details that are basically irrelevant to learning the basics to understand how normal unprivileged instructions execute in a normal environment set up by a normal OS. – Peter Cordes Sep 13 '20 at 20:56
@ChrisDodd: Perhaps worth pointing out that `mov $1, %eax` has the exact same architectural effect as `mov $1, %rax`. Compilers will never use `mov $sign_extended_imm32, r/m64` for values that could also fit as a 32-bit *zero*-extended constant, when the destination is a register. NASM will even optimize `mov rax, 1` to `mov eax,1` for you. GAS won't, though. Related: [Difference between movq and movabsq in x86-64](https://stackoverflow.com/q/40315803) – Peter Cordes Sep 13 '20 at 20:58

Matching the intel codes to disassembly output

1 Answers1

Linked

Related