-1

I've taken a look at the ref.x86asm.net manual and while it has a lot of information, I can't really make heads or tails from it as I don't know how to interpret the registers.

I'm using using 64 bit code and have a nearby line which lldb shows me the following:

48 89 e5 movq %rsp, %rbp

I know from the above statement 89 is the move command. rsp is source and rbp is the destination (I'm on osx).

in the ref.86asm manual, it states for "89" o has 'r' and op1 has r/m16/32/64. op2 has r16/32/64 I looked up the values but really don't understand how it all is supposed to work out. I saw references to REX in other people's answers but don't know what that means.

XX 89 XX movq %rax, %rdi ; how do I do this? What are the XX?

I'm writing my own byte code and have sort of figured out most of the stuff by writing c, compiling it and then looking at it in lldb. However I'd save a lot of time if I could get a better understanding of how the byte codes when they referred to registers really worked.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Div
  • 130
  • 1
  • 14
  • The linked question explains where in the Intel manuals to find the docs for the ModRM byte which encodes the src and dst operands. It's actually asking about an instruction that uses the /r field as extra opcode bits, unlike `mov`, but the important point is that your 3rd byte is the ModR/M byte, and the first byte is the REX prefix. – Peter Cordes Aug 11 '16 at 23:16
  • The first "XX" is the `48` => that's the "REX" prefix, indicating the instruction is working with `%rax`, not `%eax`. Without REX `48` prefix that `89 E5` would be decoded as `movd %esp, %ebp`. .. The second "XX" is that ModR/M byte. – Ped7g Aug 12 '16 at 10:54
  • [this article](http://wiki.osdev.org/X86-64_Instruction_Encoding) might help you. – fuz Aug 12 '16 at 10:55

2 Answers2

3

It's relatively hard to determine by hand, so when I have to do it, I write the instruction in an assembly file, assemble that and check the output. I personally use nasm.

Your file would look like this (note that this is Intel syntax, not AT&T like in your example):

[BITS 64]
main:
    mov rdi, rax

Off my head, you compile with nasm file.asm -f bin -o output, and then you open output with a hex editor. (-f bin tells nasm to do a flat binary, which is a binary that is just raw machine code.)

Nasm is just one of many. Keystone might be easier for single instruction tests. Alternatively, you can get pwntools and use the asm module.

The basic idea being: use an assembler instead of trying to figure it out by hand.

zneak
  • 134,922
  • 42
  • 253
  • 328
  • I think OP doesn't want just to determine it by hand, he wants to *understand* how it works, and code his own [limited?] assembler. For whatever reason he has (maybe educational). Your shortcut of using a ready-made tool doesn't really help in such case. Also it looks like he already knows how to get real byte code of some instruction, as he has `48 89 e5` in the question. – Ped7g Aug 12 '16 at 11:02
0

You can use the assembler (as) to figure out the bytes, and use otool to print them:

:; echo 'movq %rax, %rdi' | as
:; otool -tvj a.out
a.out:
(__TEXT,__text) section
0000000000000000    4889c7              movq    %rax, %rdi

Note that as writes to the file a.out by default.

rob mayoff
  • 375,296
  • 67
  • 796
  • 848
  • 1
    I don't think that's useful to OP. He already has all the information these two tools can give him. – fuz Aug 12 '16 at 10:55