7
0x004012d0 <main+0>:    push   %ebp
0x004012d1 <main+1>:    mov    %esp,%ebp
0x004012d3 <main+3>:    sub    $0x28,%esp

If the address is not available, can we calculate it ourselves?

I mean we only have this:

push   %ebp
mov    %esp,%ebp
sub    $0x28,%esp
Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
Mask
  • 33,129
  • 48
  • 101
  • 125

5 Answers5

6

amount of bytes is difference of addresses between adjacent instructions:

0x004012d0 <main+0>:    push   %ebp ;1 byte
0x004012d1 <main+1>:    mov    %esp,%ebp ;2 bytes
0x004012d3 <main+3>:    sub    $0x28,%esp

if you have only text then go here: http://www.swansontec.com/sintel.html and here: http://faydoc.tripod.com/cpu/conventions.htm and calculate for each instruction, prefix and operand

Andrey
  • 59,039
  • 12
  • 119
  • 163
3

You can't necessarily determine the instruction size from the mnemonic. Here are some special cases:

  • if you're in a 16-bit segment, mov eax, 0 requires a 0x66 prefix, while in a 32-bit segment it doesn't. You need to know the size of the segment.

  • in 32-bit or 16-bit mode you can encode add eax, 1 as either 0x40 (inc eax) or 0x83 0xc0 0x01 (add eax, 1). That is, there are some mnemonics that can be encoded in more than one way.

  • The memory operand [eax] may encode eax as either the base or the index. If it's the index, you'll have an additional SIB byte after the MOD/RM.

  • in 64-bit mode you can use the REX prefix 0x4x to encode the registers r8-r15. However, you can use 0x40 as some sort of null REX byte, which will add another byte to your instruction.

  • segment overrides may be used, even if the explicit segment is the same as the implicit one.

There are many other ways to encode an instruction using more or less bytes. A good assembler should probably always use the shortest one, but it's certainly not required by the architecture. The good thing is that if you study volume 2 of the Intel IA-32 Software Developer's Manual, you should be able to work it out by yourself.

Nathan Fellman
  • 122,701
  • 101
  • 260
  • 319
  • Indeed, instruction encoding, particularly on x86/x64, is ambiguous, both in the sense that two assembly mnemonics may describe the same instruction (`xchg ax, ax` and `nop`), and that there might be two binary opcodes for an assembly mnemonic (`inc eax` in 32bit - both `0x40` and `0xff 0xc0`). They're even sometimes deliberately ambiguous for a reason, like `NOP` instructions, see http://stackoverflow.com/questions/2123000/dummy-operations-handling-of-intel-processor/2124407#2124407 – FrankH. Mar 22 '11 at 11:08
  • many old instructions can be re-encoded with VEX or EVEX prefix as well, so they'll have multiple representations with different length – phuclv Feb 24 '18 at 08:39
  • @LưuVĩnhPhúc: that's not quite accurate. SSE instructions that are encoded as AVX using VEX or EVEX have slightly different behavior: * They have different enabling requirements. For instance, VEX requires CR4.OSXSAVE and various bits in XCR0. * 128-bit AVX instructions clear the upper bits of the YMM or the ZMM, while corresponding SSE instructions leave the upper bits untouched – Nathan Fellman Feb 24 '18 at 18:53
1

The first instruction is at [main+0] and the second is at [main+1] so the first instruction is 1 byte. The third instruction is at [main+3], so the second instruction is two bytes. You can't tell from the listing how long the third instruction is, since it doesn't show the address of the 4. instruction.

GT.
  • 901
  • 1
  • 6
  • 10
0

If possible have the assembler generate a listing. This will show your source code and next to will be the binary representation of the instructions and all you need to do is count how many bytes there are and then you got the size.

user3462295
  • 290
  • 2
  • 9
0

In case you have the assembly code in text, you'll have to use an assembler routine to get the
binary representation, and thus the size of the instruction(s). Of course, that is hardware dependent.

For example, here is an 80x86 32-bit Assembler open source code (OllyDbg v1.10).

Nick Dandoulakis
  • 42,588
  • 16
  • 104
  • 136