4

I'm learning the basis of reverse engineering. While reversing a crackme it happened to me to see this pattern at the beginning of almost every function:

pushl %ebp                            
movl  %esp, %ebp              
pushl %ebx              # because ebx is a callee-saved register
subl  $0x14,%esp        # of course $0x14 changes depending on the function
calll 0x08048766
addl  $0x1a5f, %ebx     # also this value sometime changes depending on the function

Where at 0x08048766 there is a function that does just this:

movl 0(%esp), %ebx         
retl 

So basically, as it is normal, every function first initialize the registers ebp and esp. Then the register ebx is pushed into the stack, and this also is totally understandable as ebx is a callee-saved register and it is used later in the function to reference some static data (from .rodata), for example:

leal  -0x17b7(%ebx), %eax
movl  %eax, 0(%esp) 
calll printf   

Now the most interesting (and to me obscure) part: If I have understood correctly, ebx is first initialized with the value pointed by esp (this using the function at 0x08048766), why? What's inside there? Isn't it an uninitialized point down into the stack?

Then another value is added to ebx. What does this value represent?

I would like to understand better how the register ebx is used in this case, and how to calculate the address it is pointing to.

You can have a look to the complete program here, but unfortunately there isn't any C source code available.

Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
TTK
  • 223
  • 6
  • 21
  • 1
    @Abhineet Nope. The OP understands those. As you can see in my answer, `ebx` is the PIC register in this code, and you'll find nothing about PIC on that wikipedia page. Your comment isn't useful here, sorry. – Jonathon Reinhart Jul 14 '16 at 11:59
  • @JonathonReinhart Okay, so I understand that I have made a useless comment. Can you kindly post some links/references to read more on GOT and PIC? Is PIC same as the ASLR in Windows? Is -fPIC == /DYNAMICBASE (VC++)? – Abhineet Jul 14 '16 at 12:10
  • 3
    PIC is not needed on Windows. PE files are relocatable - the loader fixes them up when they're loaded (which includes internal references to global variables). [`/DYNAMICBASE`](https://msdn.microsoft.com/en-us/library/ff820372.aspx) doesn't change anything except set a flag in the header to tell the loader that it is free to be randomly rebased at runtime. – Jonathon Reinhart Jul 14 '16 at 12:14
  • @JonathonReinhart Thanks, if I can, I would have chosen this as accepted comment. :) – Abhineet Jul 14 '16 at 12:25

1 Answers1

8

This code appears to have been compiled with -fPIC. PIC stands for "position-independent code", meaning it can be loaded to any address and is still able to access it's global variables.

In this case ebx is known as the PIC register, and it's used to point to the end of the GOT (the global offset table). The GOT has offsets (from the program's base address*) to each global variable being used.

Many times, the best way to learn about these kinds of things are to compile some code yourself, and look at the output. It especially makes it easier when you have your symbols to look at.

Let's do an experiment:

pic.c

int global;

int main(void)
{
    global = 4;
    return 0;
}

Compile

$ gcc -v
...
gcc version 5.3.1 20160406 (Red Hat 5.3.1-6) (GCC)

$ gcc -m32 -Wall -Werror -fPIC -o pic pic.c

Sections (abbreviated)

$ readelf -S pic
Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [13] .text             PROGBITS        080482f0 0002f0 000182 00  AX  0   0 16
  [15] .rodata           PROGBITS        08048488 000488 00000c 00   A  0   0  4
  [22] .got              PROGBITS        08049ffc 000ffc 000004 04  WA  0   0  4
  [23] .got.plt          PROGBITS        0804a000 001000 000014 04  WA  0   0  4
  [24] .data             PROGBITS        0804a014 001014 000004 00  WA  0   0  1
  [25] .bss              NOBITS          0804a018 001018 000008 00  WA  0   0  4

Disassemble (Intel syntax because AT&T drives me nuts)

$ objdump -Mintel -d --no-show-raw-insn pic

080483eb <main>:
 80483eb:   push   ebp
 80483ec:   mov    ebp,esp
 80483ee:   call   804840b <__x86.get_pc_thunk.ax> ; EAX = EIP + 5
 80483f3:   add    eax,0x1c0d            ; EAX = 0x804a000 (.got.plt, end of .got)
 80483f8:   lea    eax,[eax+0x1c]        ; EAX = 0x804a01C (.bss + 4)

 80483fe:   mov    DWORD PTR [eax],0x4   ; set `global` to 4
 8048404:   mov    eax,0x0
 8048409:   pop    ebp
 804840a:   ret    

0804840b <__x86.get_pc_thunk.ax>:
 804840b:   mov    eax,DWORD PTR [esp]
 804840e:   ret    
 804840f:   nop

Explanation

In this case, my GCC decided to use eax as the PIC register instead of ebx.

Also, note that the compiler (GCC 5.3.1) did something interesting here. Instead of accessing the variable via the GOT, it essentially used the GOT as an "anchor", and instead offsetted directly to the variable in the .bss section.


Back to your code:

pushl %ebp                            
movl %esp, %ebp              
pushl %ebx             ; because ebx is a callee-saved register
subl $0x14,%esp        ; end of typical prologue 

calll 0x08048766       ; __i686_get_pc_thunk_bx
                       ; Gets the current value of EIP after this call into EBX.
                       ; There is no other way to do this in x86 without a call

addl $0x1a5f, %ebx     ; Add the displacement to the end of the GOT.
                       ; This displacement of course changes depending on 
                       ; where the function is.
                       ; EBX now points to the end of the GOT.

leal -0x17b7(%ebx), %eax    ; EAX = EBX - 0x17b7
movl %eax, 0(%esp)          ; Put EAX on stack (arg 0 to printf)
                            ; EAX should point to some string
calll printf   

In your code also, it didn't actually "use" the GOT (otherwise we would see a second memory de-reference); it used it as an anchor to the string, probably in the read-only data section (.rodata) which also came before the GOT.

If you look at the function at 0x08048766, you'll see it looks something like this:

mov    (%esp),%eax  ; Put return address (pushed onto stack by call insn)
                    ; in eax
ret                 ; Return
Jonathon Reinhart
  • 132,704
  • 33
  • 254
  • 328
  • 2
    This is a very detailed and clear answer and without any doubt the best answer I could have looked for. Thank you! (just a little fix: EAX + 0x804a01C should be EAX = 0x804a01C , I think) – TTK Jul 14 '16 at 12:00
  • @TTK You're welcome. Thanks for pointing that out; I've fixed it. I highly suggest writing code (much bigger, more complex programs than I've shown here), and disassembling them. – Jonathon Reinhart Jul 14 '16 at 12:04
  • @TTK Also, you might look at the section headers (`readelf -S`) like I did, to verify my analysis of your program. With only the disassembly you provided (which is sufficient for your question, no worries), it's hard to tell for sure. But it makes sense, because `printf` expects a pointer to a string (not a GOT entry). – Jonathon Reinhart Jul 14 '16 at 12:05
  • 2
    Worth noting that PIC is much simpler in x86-64 code because of RIP-relative addressing modes. `rbx` isn't tied up holding a pointer to the GOT. So all this extra overhead only applies to 32bit x86 PIC. (But PIC still isn't free on x86-64, especially if you aren't careful to declare as many functions and variables as possible `static`. Calls to non-static functions within a shared library still indirect through the PLT.) – Peter Cordes Jul 14 '16 at 19:07
  • 3
    @PeterCordes Good point. I was debating whether or not to bring x86-64 into the picture or not, but you're absolutely right. Re: *"Calls to non-static functions within a shared library still indirect through the PLT."* IIRC there was some work to make this not happen anymore. If the visibility of the symbols is "hidden" (so they're not visible outside the library), there's no way they can be overridden by other dynamic objects, etc. so I don't see why those calls would ever need to indirect. – Jonathon Reinhart Jul 14 '16 at 19:32