1

I have a disassembled prog that has a snipped of code like this:

   │0x804873b <main>        push   %ebp                                        
   │0x804873c <main+1>      mov    %esp,%ebp                                   │
   │0x804873e <main+3>      and    $0xfffffff0,%esp                            │
   │0x8048741 <main+6>      sub    $0x50,%esp                                  │
   │0x8048744 <main+9>      cmpl   $0x2,0x8(%ebp)                              │
   │0x8048748 <main+13>     je     0x8048770 <main+53>                         │
   │0x804874a <main+15>     mov    0xc(%ebp),%eax   

I would assume mov is a 2 byte long instruction then, but if I do

x/i 0x804873d 0x804873d <main+2>:  in     $0x83,%eax

Why am I getting this other thing? Are these the non-pseudo instructions or Gdb's fault?

ks1322
  • 33,961
  • 14
  • 109
  • 164
knowads
  • 705
  • 2
  • 7
  • 24
  • `mov %esp,%ebp` begins with `0x804873c`, whereas you try to disassemble `0x804873d`. – hidefromkgb Feb 12 '18 at 10:52
  • 2
    Yes, `mov ebp,esp` is two byte long `89 E5` opcode. When you decide to disassemble from `E5` byte onward, you will get the `in` instruction. Why would you NOT get this other thing? Bytes are bytes, if you point CPU to execute or disassemble them, it will try. The x86 ISA is so rich, that almost every value forms valid instruction. – Ped7g Feb 12 '18 at 11:14
  • 1
    You might have simply made a typo in the address as the last digit is `b` not `d`. If it was deliberate, then see Ped7g's comment, above. – Jester Feb 12 '18 at 11:51
  • 1
    variable length instruction sets are very difficult to disassemble, you have to disassemble in execution order from a known correct address (assuming no code has been placed to defeat the disassembler). toolchains like gnu place enough information in the elf or other similar file formats to make disassembly on an instruction set like x86 work. If you dont have that information you can only get so far, if you start at a bad address and/or disassemble in linear address order rather than execution, expect it to fail. – old_timer Feb 12 '18 at 16:25

1 Answers1

2

Yes, the mov is 2 bytes long, but it starts at 0x...c, and thus the next instruction starts at 0x...e.

You disassembled starting at the 2nd byte of mov (0x...d), not the start of the next instruction that would execute if decoding starts at the top of the function.

The bytes starting with the ModR/M byte of the mov do represent a valid instruction (like most sequences of bytes), because x86 opcode space is mostly full. Usually you only get invalid instructions when the following bytes code for operands that aren't compatible with the instruction; this is rare because most opcodes don't have any incompatible operand patterns.

TL:DR: if you decode "out of sync", you will usually get valid x86 instructions, and may never get back "in sync" with what the compiler emitted. The same applies to the CPU decoding instructions after a jump; one code-obfuscation technique is to hide instructions in immediate data or something, and jump to that.

Or even for optimization purposes: to get the first instruction of a loop to not execute on the first iteration, consume those bytes as an immediate to a mov eax, imm32 or something. This is smaller than jumping over an instruction or peeling the first iteration, and may not be any slower if it doesn't confuse the uop cache / loop buffer.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847