2

Hey I'm wondering about some instructions that should only be valid in longmode.

For example 0f 20 55 - mov rbp, cr2.

I'm referencing ref.x86asm.net xml mapping.
According to the xml the mode of operation of this instruction is e which means:

e applies for 64-bit mode. SMM is not taken into account. 63 MOVSXD

Now if I look at disassemblers such as GCC or capstone, the byte stream 0f 20 55 is being decoded to mov ebp, cr2 on protected mode even tho the reference is saying it shouldn't be available on modes other than x64.

So I'm wondering if I'm not understanding something or these disassemblers are at fault?

Jorayen
  • 1,737
  • 2
  • 21
  • 52

1 Answers1

6

Moves to and from control registers are available in protected mode and long mode, using the same encoding, but with a different meaning. mov rbp, cr2 is only available in long mode (obviously, it writes to a 64bit GPR which only exists in long mode) and mov ebp, cr2 is only available in protected mode (it is not inherently impossible in long mode, but its encoding was reused for mov rbp, cr2, just as the encoding of push eax was reused to mean push rax). The disassemblers correctly interpreted the same machine code differently depending on the mode.

harold
  • 61,398
  • 6
  • 86
  • 164
  • I'm more struggling to understand what the reference meant by "applies for 64-bit mode". If I come across other instructions with mode `e` should I assume it only works for x64 or something else? – Jorayen Sep 15 '21 at 19:26
  • 3
    @Jorayen I think the most consistent interpretation is "if you encode this as 64bit code and then decode it as 32bit code, you get something with a different meaning". In the case of `mov rbp, cr2` the difference in meaning is small, in other cases it can be large, for example `movsxd` turns into `arpl`. – harold Sep 15 '21 at 19:33
  • 4
    @Jorayen: The point is, you have to know what mode you're in to interpret any opcodes. – Joshua Sep 15 '21 at 19:34
  • @harold Maybe on the note of `movsxd` and `arpl` can you shade light on why https://wiki.osdev.org/X86-64_Instruction_Encoding#64-bit_addressing says REX prefix must be encoded if using `using 64-bit operand size and the instruction does not default to 64-bit operand size` and both GCC and capstone decode `63 33` to `movsxd rsi, dword ptr [rbx]` even tho REX prefix wasn't present? – Jorayen Sep 15 '21 at 20:42
  • 2
    @Jorayen: Intel's manual says this encoding is "discouraged": https://www.felixcloutier.com/x86/movsx:movsxd. As best I can tell, it should be disassembled as `movsxd esi, dword ptr [rbx]`, i.e. it "sign-extends" a 32-bit value to a 32-bit value, which doesn't actually extend at all. I'd expect the high half of `rsi` to be zeroed unconditionally. But I have not tested it. – Nate Eldredge Sep 15 '21 at 21:08
  • 3
    @Jorayen I believe GCC and capstone are wrong (and the disassembler in visual studio disagrees with them). Discouraged or not, the version of `movsxd` with a 32-bit destination register exists, `movsxd` does not have a 64-bit operand size by default. On my PC, executing a 32-bit `movsxd` with a negative source clears the upper 32 bits of the destination, so it's being interpreted as a 32-bit instruction. But don't take my word for it, it's a weird edge case. – harold Sep 15 '21 at 21:14
  • 3
    @Jorayen: Now I have tested it and yes, that's what it does. So that's arguably a bug in those disassemblers. gdb says `movslq (%rbx),%esi` which I think is right. – Nate Eldredge Sep 15 '21 at 21:15
  • 2
    @Jorayen: Btw the disassembler on defuse.ca must be using objdump, not GCC (a compiler doesn't need a disassembler). They might have an old or buggy version, since my GNU objdump 2.34 disassembles correctly as `esi` in either AT&T or Intel mode. – Nate Eldredge Sep 15 '21 at 21:31
  • 1
    @Jorayen: Even more fun: `movsxd ax, [mem]` is encodeable, and according to my testing on Skylake ([MOVZX missing 32 bit register to 64 bit register](https://stackoverflow.com/q/51387571)), Intel's manual is correct that `MOVSXD r16, r/m16` is encodeable with a 66 prefix, and does only read 2 bytes from memory, not 4. (We can tell by running it on the last 2 bytes before an unmapped page.) My testing also found that `movsxd eax,edx` has a false output dependency on the RAX destination register, again on Skylake, unlike `movsxd rax, edx`. Still zeros the upper bytes, no actual dep. – Peter Cordes Sep 16 '21 at 01:41
  • 1
    (ping @Nate, see my previous comment. Yes, MOVSXD with a 32-bit or even 16-bit destination is encodeable, but slower than the normal version with a REX.W. There's no reason to actually use it vs. `mov`, except for silly computer tricks / obfuscation. So yes, current GNU binutils disassembly is correct, and the disassemblers Jorayen used are clearly wrong.) – Peter Cordes Sep 16 '21 at 01:44