Why does `add rax, 4294967` make `48 05 37 89 41 00` in x86 hex?

Question

This expression:

add rax, 4294967

Gets compiled down to these bytes:

48 05 37 89 41 00

I generated this expression to see what this instruction opcode type would look like:

REX.W + 05 id

I am guessing that the trailing 37 89 41 00 is the direct 32-bit encoding of 4294967, so that is id. Then the 05 is in there. The only thing that remains is the 48, which is somehow related to REX.W in REX.W + 05 id.

Here's the background reading I did (over and over again) on REX.W.

Background Reading

REX prefixes are a set of 16 opcodes that span one row of the opcode map and occupy entries 40H to 4FH.

Where is this opcode map they speak of?

REX bits are defined like this:

0100
W
  0 = Operand size determined by CS.D
  1 = 64 Bit Operand Size
R
  Extension of the ModR/M reg field
X
  Extension of the SIB index field
B
  Extension of the ModR/M r/m field, SIB base field, or Opcode reg field

In 3.1.1.1 Opcode Column in the Instruction Summary Table, we have:

REX.W — Indicates the use of a REX prefix that affects operand size or instruction semantics. The ordering of the REX prefix and other optional/mandatory instruction prefixes are discussed Chapter 2. Note that REX prefixes that promote legacy instructions to 64-bit behavior are not listed explicitly in the opcode column.

It affects operand size if REX.W is set (correct?), but REX.W doesn't affects instruction semantics by extending the ModR/M, SIB, or Opcode reg field, that is for R, X, and B correct?

I didn't see this part: The ordering of the REX prefix and other optional/mandatory instruction prefixes are discussed Chapter 2.

Later we have:

Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.

Question

How do you get 48 from REX.W?

This is an I operand encoding, which shows:

AL/AX/EAX/RAX

No ModR/M bytes. But implicitly I'm guessing, because we're using RAX, that is a 64-bit value so it needs to have a REX prefix.

If I take the magic 0100 for REX, and plugin 1 for the REX.W, and 0's for the rest, that gives us 01001000, which is 72, not 48. Hmm...

Looking back at tables 2-1 through 2-3, the value 48 doesn't seem to be associated with RAX anywhere.

So I'm at a loss on how they got this 48. How did they get the 48?

Generically, how do you calculate the value of REX.W when you see it in the opcode field in the Intel docs?

Ah I just noticed it is somehow related to 40H to 4FH. Hmm...

`0x48` *is* a REX prefix with the W bit set, and the others clear, of course. https://wiki.osdev.org/X86-64_Instruction_Encoding#Encoding. In decimal, yes that's 72, but machine code bytes are invariably shown in hex, for obvious reasons, often without cluttering things up by putting a 0x in front of each one, or an `h` suffix, in disassembly. Stuff like `48` being `0x48` in this context is something you'll have to get used to, to make sense of the tools you're using, like `ndisasm` or `objdump -d` — Peter Cordes, Jan 28 '21 at 05:00
`0x48` isn't associated with RAX at all; it's the REX prefix. The `0x5` opcode is what makes (part of or whole) RAX the implicit destination. (The "I" format for `add` in its table, so yes this is `add rax, sign_extended_imm32`. [How to read the Intel Opcode notation](https://stackoverflow.com/a/53976236) somewhat covers that.) — Peter Cordes, Jan 28 '21 at 05:17
@PeterCordes Funnily enough, I have the POWER reference manual open right now and it does actually show opcodes in decimal. It's quite a strange sight. — fuz, Jan 28 '21 at 12:58
@fuz: IBM also counts bits backwards, with MSB=0, so the low bits get renumbered in 64-bit mode. I wouldn't have guessed they'd show opcodes in decimal, but if anyone was going to do it, IBM surprised me the least. — Peter Cordes, Jan 28 '21 at 18:05
Also, in my 2nd comment, I meant to link [How to determine if ModR/M is needed through Opcodes?](https://stackoverflow.com/q/55312459) for the description of "I" format vs. "MI" format as keys to the operand-encoding table on the same page (modrm with the destination encoded by M, and an immediate source, in that order.) — Peter Cordes, Jan 28 '21 at 18:08

Why does `add rax, 4294967` make `48 05 37 89 41 00` in x86 hex?

Background Reading

Question

0 Answers0