5

I am reading the book Hacking: The Art of Exploitation, 2nd Edition and in the simple C program

#include <stdio.h>
int main()
{
    int i;  
    for (i = 0; i < 10; i++)
    {
        puts("Hello, world!\n");
    }
    return 0;
}

The book lists that the gdb debug will modify the ebp register first:

(gdb) x/i $eip 0x8048384 <main+16>: mov DWORD PTR [ebp-4],0x0

As it explains that This assembly instruction will move the value of 0 into memory located at the address stored in the EBP register, minus 4. This is where the C vari- able i is stored in memory; i was declared as an integer that uses 4 bytes of memory on the x86 processor

This makes sense for me, but when I test the exactly step on my "very old I386" Linux Laptop, here is what I got:

(gdb) x/i $eip => 0x4011b6 <main+29>:   mov    DWORD PTR [ebp-0xc],0x0

So on my laptop, it shows [ebp-0xc], instead of [ebp-4]. Based on my understanding, "0xc" as Hex will be 12, so it will be 12 byes? If so, why?

Here is the whole assemble dump on my laptop for this simple program (gdb) disassemble main

Dump of assembler code for function main:
   0x00401199 <+0>: lea    ecx,[esp+0x4]
   0x0040119d <+4>: and    esp,0xfffffff0
   0x004011a0 <+7>: push   DWORD PTR [ecx-0x4]
   0x004011a3 <+10>:    push   ebp
   0x004011a4 <+11>:    mov    ebp,esp
   0x004011a6 <+13>:    push   ebx
   0x004011a7 <+14>:    push   ecx
   0x004011a8 <+15>:    sub    esp,0x10
   0x004011ab <+18>:    call   0x4010a0 <__x86.get_pc_thunk.bx>
   0x004011b0 <+23>:    add    ebx,0x2e50
=> 0x004011b6 <+29>:    mov    DWORD PTR [ebp-0xc],0x0
   0x004011bd <+36>:    jmp    0x4011d5 <main+60>
   0x004011bf <+38>:    sub    esp,0xc
   0x004011c2 <+41>:    lea    eax,[ebx-0x1ff8]
   0x004011c8 <+47>:    push   eax
   0x004011c9 <+48>:    call   0x401030 <puts@plt>
   0x004011ce <+53>:    add    esp,0x10
   0x004011d1 <+56>:    add    DWORD PTR [ebp-0xc],0x1
   0x004011d5 <+60>:    cmp    DWORD PTR [ebp-0xc],0x9
   0x004011d9 <+64>:    jle    0x4011bf <main+38>
   0x004011db <+66>:    mov    eax,0x0
   0x004011e0 <+71>:    lea    esp,[ebp-0x8]
   0x004011e3 <+74>:    pop    ecx
   0x004011e4 <+75>:    pop    ebx
   0x004011e5 <+76>:    pop    ebp
   0x004011e6 <+77>:    lea    esp,[ecx-0x4]
   0x004011e9 <+80>:    ret    
End of assembler dump.
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Yong Zhang
  • 88
  • 5
  • 1
    Can you post your code? Are you within a function? Those extra 8 bytes may have been reserved for the return value and a local parameter, for example. – rturrado Feb 07 '21 at 17:34
  • The code is very simple `code` (gdb) list main 1 #include 2 3 int main() 4 { 5 int i; 6 for (i = 0; i < 10; i++) 7 { 8 puts("Hello, world!\n"); 9 } 10 return 0; `code` – Yong Zhang Feb 07 '21 at 17:48
  • I definitely need to polish my assembler :) Anyway, consider a few things. 1) You're compiling this in debug mode, aren't you? 2) Move your code to a function just to test `ebp-0x4` is indeed used. 3) `main` is a bit of a special function. I wonder if `argc` and `argv` have something to do with these extra 8 bytes (although they are parameters and they should be accesed with `ebp+...`) – rturrado Feb 07 '21 at 18:11
  • Also, what compiler are you using? What's the compilation line? – rturrado Feb 07 '21 at 18:16
  • 3
    Your compiler just happens to place the variable at a different offset on the stack than the compiler used by the book's authors did. That shouldn't be cause for surprise - different compiler versions, options, OSes, configuration choices, etc, can cause such things to vary. The exact reasons for this are complex but not really important to understanding how your actual application code works. – Nate Eldredge Feb 07 '21 at 19:11
  • Here is how I compiled on my I386 laptop: "gcc -g firstprog.c", and this is the version of my gcc -> gcc version 8.3.0 (Debian 8.3.0-6). The assemble output on my laptop is different as shown in the book, and I lost where the string "Hello, world" stored in which memory address on my laptop case. – Yong Zhang Feb 07 '21 at 19:31
  • Your string's address comes from `lea eax,[ebx-0x1ff8]`, where `ebx` was previously set based on the address of your code. This is necessary when you have position-independent code, which the book's example probably wasn't using. Compiling with `-fno-pie` may simplify things. Again, variations like this are just part of what you have to get used to if you don't have precisely the same compiler / configuration as the book did. – Nate Eldredge Feb 07 '21 at 19:45
  • Thanks for all the comments. Now I know the compiler is different, so it will generate the different assemble code. I will just enjoy the book and play around it. Thanks – Yong Zhang Feb 07 '21 at 20:50

1 Answers1

5

sub esp,0x10

allocated 16 bytes (four registers worth) of space on the stack for variables and other stuff.

mov DWORD PTR [ebp-0xc],0x0

appears to be the first reference to slot ebp-0xc, and it's being initialized to zero. After looking at cmp DWORD PTR [ebp-0xc],0x9 at main+60 I'm certain this is i = 0 from the initialization section of the for loop.

The compiler can put variables where it will, and while deterministic it changes with patch versions of the compiler.

Joshua
  • 40,822
  • 8
  • 72
  • 132
  • 1
    Worth mentioning that it's reserving more stack space to maintain/restore 16-byte stack alignment before a `call puts`, and/or to save EBX because `-fPIE -pie` was enabled by default in the OP's GCC version. The `call 0x4010a0 <__x86.get_pc_thunk.bx>` won't be there in the book, either - that's a sign of position independent code which is inefficient in 32-bit mode. – Peter Cordes Feb 07 '21 at 20:30
  • [Compiling C to 32-bit assembly with GCC doesn't match a book](https://stackoverflow.com/q/64109864) is for a different book but has some detail about the `lea ecx,[esp+0x4]` / `and` stuff to align ESP, and the `__x86.get_pc_thunk`. – Peter Cordes Feb 07 '21 at 20:37
  • 1
    Generic Q&A about this specific book: ["Hacking: The Art of Exploitation" - Assembly Inconsistencies in book examples vs. my system's gcc](https://stackoverflow.com/q/27053865). ["Art of Exploitation" disassembly example isn't the same (C code)](https://stackoverflow.com/q/6975617) is this specific code from the same book, but with AT&T syntax. The answers are different, so I'm reluctant to just close as duplicate in either direction. (Although that question is less specific.) – Peter Cordes Feb 07 '21 at 20:52
  • Thanks for all the links. My goal is not trying to produce the exactly same assemble output, but really to understand how the data are stored in the memory, and get them as shown in the book. In the above, @Nate Eldredge's answer help me to understand the newer version assemble output, and I did retrieve exactly the string I am looking for from the memory. That's the real reason behind I asked this question. Thanks again. – Yong Zhang Feb 07 '21 at 21:21