17

Looking at some assembly code for x86_64 on my Mac, I see the following instruction:

48 c7 c0 01 00 00 00  movq    $0x1,%rax

But nowhere can I find a reference that breaks down the opcode. It seems like 48c7 is a move instruction, c0 defines the %rax register, etc.

So, where can I find a reference that tells me all that?

I am aware of http://ref.x86asm.net/, but looking at 48 opcodes, I don't see anything that resembles a move.

Johan
  • 74,508
  • 24
  • 191
  • 319
Christoph
  • 1,580
  • 5
  • 17
  • 29
  • 3
    I've seen similar questions here. If I could find this on Google, I wouldn't have asked. The fact that I am aware of the reference I posted in my question also shows that I am not just too lazy to search myself. – Christoph Jun 24 '12 at 19:58
  • 3
    @Oded, googling for "x86 0x48 instruction prefix" is quite tricky if you don't know what you are looking for... – Griwes Jun 24 '12 at 19:59
  • @Oded I reworded my question to be more developer specific. Given the (really good!) reference at x86asm.net, I guess I just need to understand how that opcode is broken up. Griwes helped with that. – Christoph Jun 24 '12 at 20:01
  • 1
    If you didn't find the 0x48 at x86asm.net, that's because you didn't look right: http://ref.x86asm.net/coder64.html#x48 . -1. – Gunther Piez Jun 24 '12 at 22:50
  • I was looking for a mov. I know better now, thanks. – Christoph Jun 25 '12 at 15:35

2 Answers2

24

Actually, mov is 0xc7 there; 0x48 is, in this case, a long mode REX.W prefix.

Answering also the question in comments: 0xc0 is b11000000. Here you can find out that with REX.B = 0 (as REX prefix is 0x48, the .B bit is unset), 0xc0 means "RAX is first operand" (in Intel syntax; mov rax, 1, RAX is first, or, in case of mov, output operand). You can find out how to read ModR/M here.

Griwes
  • 8,805
  • 2
  • 43
  • 70
3

When you look at the binary

 48 c7 c0 01 00 00 00

you need to disassemble it in order to understand its meaning.

The algorithm for disassembling is not difficult, but it's complex. It supposes looking up multiple tables.

The Algorithm is described in the 2nd volume of Intel Developer Manual,

Intel® 64 and IA-32 Architectures
Software Developer’s Manual
Volume 2 (2A, 2B & 2C):
Instruction Set Reference, A-Z

You start reading from the chapter called INSTRUCTION FORMAT.

Or, there are good books which dedicate whole chapters on this topic, such as

  X86 Instruction Set Architecture, Mindshare, by Tom  Shanley.

A whole chapter is dedicated to disassembling binary X86.

Or you can start reading the general algorithm from a manual for the same language made by AMD:

AMD64 Architecture
Programmer’s Manual
Volume 3:
General-Purpose and System Instructions

Here, in the chapter Instruction Encoding you will find the automaton that defines this language of instructions, and from this graphical scheme you can write easily the decoder.

After you do this you can come back to the Intel Manual, 2nd volume, and use it as a reference book.

I also found useful the reverse engineering class from http://opensecuritytraining.info/. This site is created by a Phd student from CMU, most of it is't well done, but it requires longer time to study and apply it.

After you understand the basic ideas you can look over a free project that implements the algorithm. I found useful the distorm project. At the beginning it is important not to look at abstract projects (like qemu or objdump), which try to implement dissasemblers for many languages in the same code as you will get lost. Distorm focuses only on x86 and implements it correctly and exhaustively. It conveys in formal language the definition of X86 language, while the Intel and AMD manuals define X86 language by using natural language.

Other project that works well is udis86 .

alinsoar
  • 15,386
  • 4
  • 57
  • 74