Yes, mov to a register then to memory for immediates that won't fit in a sign-extended 32-bit, unlike -1
aka 0xFFFFFFFFFFFFFFFF
. The why part is interesting question, though:
Remember that asm only lets you do what's possible in machine code. Thus it's really a question about ISA design. Such decisions often involve what's easy for the hardware to decode, as well as encoding efficiency considerations. (Using up opcodes on rarely-used instructions would be bad.)
It's not designed to make things harder, it's designed to not need any new opcodes for mov
, when AMD was extending x86 to 64-bit and aiming not need a whole separate decoder unit for different modes. And also to limit 64-bit immediates to one special instruction format. mov
is the only instruction that can ever use a 64-bit immediate at all (or a 64-bit absolute address, for load/store of AL/AX/EAX/RAX).
Check out Intel's manual for the forms of mov
(note that it uses Intel syntax, destination first, and so will my answer.) I also summarized the forms (and their instruction lengths) in Difference between movq and movabsq in x86-64, as did @MargaretBloom in answer to What's the difference between the x86-64 AT&T instructions movq and movabsq?.
Allowing an imm64 along with a ModR/M addressing mode would also make it possible to run into the 15-byte upper limit on instruction length pretty easily, e.g. REX + opcode + imm64 is 10 bytes, and ModRM+SIB+disp32 is 6. So mov [rdi + rax*8 + 1234], imm64
would not be encodeable even if there was an opcode for mov r/m64, imm64
.
And that's assuming they repurposed one of the 1-byte opcodes that were freed up by making some instructions invalid in 64-bit mode (e.g. aaa
), which might be inconvenient for the decoders (and instruction-length pre-decoders) because in other modes those opcodes don't take a ModRM byte or an immediate.
movq
is for the forms of mov
with a normal ModRM byte to allow an arbitrary addressing mode as the destination. (Or as the source for movq r64, r/m64
). AMD chose to keep the immediate for these as 32-bit, same as with 32-bit operand size1.
These forms of mov
are the same instruction format as other instructions like add
. For ease of decoding, this means a REX prefix doesn't change the instruction-length for these opcodes. Instruction-length decoding is already hard enough when the addressing mode is variable-length.
So movq
is 64-bit operand-size but otherwise the same instruction format mov r/m64, imm32
(becoming the sign-extended-immediate form, same as every other instruction which only has one immediate form), and mov r/m64, r64
or mov r64, r/m64
.
movabs
is the 64-bit form of the existing no-ModRM short form mov reg, imm32
. This one is already a special case (because of the no-modrm encoding, with register number from the low 3 bits of the opcode byte). Small positive constants can just use 32-bit operand-size for implicit zero-extension to 64-bit with no loss of efficiency (like 5-byte mov eax, 123
/ AT&T mov $123, %eax
in 32 or 64-bit mode). And having a 64-bit absolute mov
is useful so it makes sense AMD did that.
Since there's no ModRM byte, it can only encode a register destination. It would take a whole different opcode to add a form that could take a memory operand.
From one POV, be grateful you get a mov
with 64-bit immediates at all; RISC ISAs like AArch64 (with fixed-width 32-bit instructions) need more like 4 instructions just to get a 64-bit value into a register. (Unless it's a repeating bit-pattern; AArch64 is actually pretty cool. Unlike earlier RISCs like MIPS64 or PowerPC64)
If AMD64 was going to introduce a new opcode for mov
, mov r/m, sign_extended_imm8
would be vastly more useful to save code-size. It's not at all rare for compilers to emit multiple mov qword ptr [rsp+8], 0
instructions to zero a local array or struct, each one containing a 4-byte 0
immediate. Putting a non-zero small number in a register is fairly common, and would make mov eax, 123
a 3-byte instruction (down from 5), and mov rax, -123
a 4-byte instruction (down from 7). It would also make zeroing a register without clobbering FLAGS 3 bytes.
Allowing mov
imm64 to memory would be useful rarely enough that AMD decided it wasn't worth making the decoders more complex. In this case I agree with them, but AMD was very conservative with adding new opcodes. So many missed opportunities to clean up x86 warts, like widening setcc
would have been nice. (Intel finally got around to this with APX providing REX2 and EVEX prefixes for a zero-upper form of setcc
.) But I think AMD wasn't sure AMD64 would catch on, and didn't want to be stuck needing a lot of extra transistors and/or power to support a feature if people didn't use it.
Footnote 1:
32-bit immediates in general is pretty obviously a good decision for code-size. It's very rare to want to add
an immediate to something that's outside the +-2GiB range. It could be useful for bitwise stuff like AND
, but for setting/clearing/flipping a single bit the bts
/ btr
/ btc
instructions are good (taking a bit-position as an 8-bit immediate, instead of needing a mask). You don't want sub rsp, 1024
to be an 11-byte instruction; 7 is already bad enough.
Giant instructions? Not very efficient
At the time AMD64 was designed (early 2000s), CPUs with uop caches weren't a thing. (Intel P4 with a trace cache did exist, but in hindsight it was regarded as a mistake.) Instruction fetch/decode happens in chunks of up-to-16 bytes, so having one instruction that's nearly 16 bytes isn't much better for the front-end than movabs $imm64, %reg
.
Of course if the back-end isn't keeping up with the front-end, that bubble of only 1 instruction decoded this cycle can be hidden by buffering between stages.
Keeping track of that much data for one instruction would also be a problem. The CPU has to put that data somewhere, and if there's a 64-bit immediate and a 32-bit displacement in the addressing mode, that's a lot of bits. Normally an instruction needs at most 64-bits of space for an imm32 + a disp32.
BTW, there are special no-modrm opcodes for most operations with RAX and an immediate. (x86-64 evolved out of 8086, where AX/AL was more special, see this for more history and explanation). It would have been a plausible design for those add/sub/cmp/and/or/xor/... rax, sign_extended_imm32
forms with no ModRM to instead use a full imm64. The most common case for RAX, immediate uses an 8-bit sign-extended immediate (-128..127), not this form anyway, and it only saves 1 byte for instructions that need a 4-byte immediate. If you do need an 8-byte constant, though, putting it in a register or memory for reuse would be better than doing a 10-byte and-imm64 in a loop, though.