1

I am currently trying to understand Writing buffer overflow exploits - a tutorial for beginners.

The C code, compiled with cc -ggdb exploitable.c -o exploitable

#include <stdio.h>

void exploitableFunction (void) {
    char small[30];
    gets (small);
    printf("%s\n", small);
}

main() {
    exploitableFunction();
    return 0;
}

seems to have the assembly code

0x000000000040063b <+0>:    push   %rbp
0x000000000040063c <+1>:    mov    %rsp,%rbp
0x000000000040063f <+4>:    callq  0x4005f6 <exploitableFunction>
0x0000000000400644 <+9>:    mov    $0x0,%eax
0x0000000000400649 <+14>:   pop    %rbp
0x000000000040064a <+15>:   retq

I think it does the following, but I'm really not sure about it and I would like to hear from somebody who is experienced with assembly code if I'm right / what is right.

  • 40063b: Put the address which is currently in the base pointer register into the stack segment (How is this register initialized? Why is that done?)
  • 40063c: Copy the value from the stack pointer register into the base pointer register (why?)
  • 40063f: Call exploitableFunction (What exactly does it mean to "call" a function in assembly? What happens here?)
  • 400644: Copy the value from the address $0x0 to the EAX register
  • 400649: Copy the value from the top of the stack (determined by the value in %rsp) into the base pointer register (seems to be confirmed by Assembler: Push / pop registers?)
  • 40064a: Return (the OS uses what is in %EAX as return code - so I guess the address $0x0 contains the constant 0? Or is that not an address but the constant?)
Community
  • 1
  • 1
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
  • 3
    I'm sure someone will give you more details in an answer, but the +0, +1, +14, and +15 are the standard function setup (prolog and epilog) on Intel, that set up a stack frame. – Max Sep 02 '15 at 15:38
  • 2
    I suggest you do some reading on x86 assembly language. You can't expect to learn the basics by asking a question about every single assembly line you see. – interjay Sep 02 '15 at 15:40
  • 1
    Moreover, do be aware that there is no specified association between C code and any particular assembly or machine code. That's *entirely* up to the chosen C implementation. – John Bollinger Sep 02 '15 at 15:41
  • 1
    [`main()` is wrong](http://stackoverflow.com/q/204476/995714), and [`gets()` is deprecated](http://stackoverflow.com/q/7423691/995714) – phuclv Sep 02 '15 at 15:42
  • To add to what Max said: the instructions you see here are the saving of the location to return to, the loading of the location to move to where the function's code is; and the reverse of the same. i.e. it's the application of the calling convention. – Toby Sep 02 '15 at 15:42
  • 1
    literals with `$` prefix are constants, not address – phuclv Sep 02 '15 at 15:43
  • 2
    @LưuVĩnhPhúc This is not really the point of the question, I feel... OP wants to understand how precisely the exploit works; The name of the function `exploitableFunction()` is a dead giveaway that OP is aware of how grossly bad `gets()` is. – Iwillnotexist Idonotexist Sep 02 '15 at 15:44
  • 2
    Also, see http://stackoverflow.com/questions/1395591/what-is-exactly-the-base-pointer-and-stack-pointer-to-what-do-they-point – Toby Sep 02 '15 at 15:46
  • `push rbp; mov rbp, rsp` is the function prolog http://stackoverflow.com/q/14296088/995714 – phuclv Sep 02 '15 at 15:47
  • 1
    You appear to be asking about [calling conventions (cdecl, stdcall, etc.)](https://en.m.wikipedia.org/wiki/X86_calling_conventions); you should find lots of articles on the 'net if you search for any of these terms. – stakx - no longer contributing Sep 02 '15 at 16:23
  • Note that some compilers (like Microsoft) have an option to turn off frame pointers, eliminating the rbp / rsp stuff. Function parameters would be accessed via rsp instead of rbp. – rcgldr Sep 03 '15 at 00:39

3 Answers3

2

40063b: Put the address which is currently in the base pointer register into the stack segment (How is this register initialized? Why is that done?)

You want to save the base pointer because it is probably used by the calling function.

40063c: Copy the value from the stack pointer register into the base pointer register (why?)

This gives you a fixed position into the stack, which might contain parameters for the function. It can also be used as a base address for any local variables.

40063f: Call exploitableFunction (What exactly does it mean to "call" a function in assembly? What happens here?)

"call" means pushing the return address (address of the next instruction) onto the stack, and then jumping to the start of the called function.

400644: Copy the value from the address $0x0 to the EAX register

It is actually the value 0 from the return statement.

400649: Copy the value from the top of the stack (determined by the value in %rsp) into the base pointer register (seems to be confirmed by Assembler: Push / pop registers?)

This restores the base pointer we saved at the top. The calling function might assume that we do.

40064a: Return (the OS uses what is in %EAX as return code - so I guess the address $0x0 contains the constant 0? Or is that not an address but the constant?)

It was the constant from return 0. Using EAX for a small return value is a common convention.

Bo Persson
  • 90,663
  • 31
  • 146
  • 203
  • "You want to save the base pointer because it is probably used by the calling function." - what is the calling function of `main`? Or is that simply done because it is simpler not to treat `main` different from other functions? – Martin Thoma Sep 02 '15 at 15:56
  • 2
    @moose The `main` function in C is a function as all others, the only special property is that is called from CRT (C RunTime) at very start of user code. The CRT is where the OS starts the execution of your code. The whole sequence is standard and it's called prologue, it together with the epilogue, that is the reverse sequence actuated before to leave the function, define a standard `stack-frame`. For more info you can google. – Frankie_C Sep 02 '15 at 15:59
1

I found a Link which have similar code to your own with full explenation.

  • 40063b: push the old base pointer onto the stack to save it for later. It's pushed because this is not the only process in the code. some other process call it.
  • 40063c: copy the value of the stack pointer to the base pointer. After this, %rbp points to the base of main’s stack frame.
  • 40063f: call the function in address 0x4005f6 which push the program counter into stack and load address 0x4005f6 into program conter, when the function returns, pop operation is happened to return the saved address in the stack to program counter which is 0x400644 here
  • 400644: This instruction copies 0 into %eax, The x86 calling convention dictates that a function’s return value is stored in %eax
  • 400649: We pop the old base pointer off the stack and store it back in %rbp
  • 40064a: jumps back to return address, which is also stored in the stack frame. which specify the end of the program.

Also you didn't mention the assembly code for the function exploitableFunction. here is only main function

Nasr
  • 2,482
  • 4
  • 26
  • 31
  • 2
    [link only answers are discouraged](http://meta.stackexchange.com/q/8231/230282), as they will have no meaning when the link dies – phuclv Sep 02 '15 at 15:48
  • Although Link answers are discouraged, I'm not sure why this got downvoted. The link was quite helpful so far (and I'm not done with reading it by now). +1. – Martin Thoma Sep 02 '15 at 16:09
  • `%rbp` is pushed because it is the caller's stack-frame pointer, and the caller needs it upon return. – Paul Ogilvie Sep 02 '15 at 18:26
1

The function entry saves bp and moves sp into bp. All parameters of the function will now be addressed using bp. This is a standard cdecl convention (in Intel assembler):

; int example(char *s, int i)
    push        bp                    ; save the caller's value of bp
    mov         bp,sp                 ; set-up our base pointer to the stack-frame
    sub         sp, 16                ; room for automatic variables
    mov         ax,dword ptr [bp+8]   ; ax has *s
    mov         bx,dword ptr [bp+12]  ; bx has i
    ...                               ; do your thing
    mov         ax, dword ptr[result] ; function return in ax
    pop         bp                    ; restore caller's base-pointer
    ret

When calling this function, the compiler pushes the parameters onto the stack and then calls the function. Upon return, it cleans up the stack:

; i= example(myString, k);
    mov         ax, [bp+16]        ; this gets a parameter of the curent function
    push        ax                 ; this will be parameter i
    mov         ax, [bp-16]        ; this gets a local variable
    push        ax                 ; this is parameter s
    call        example
    add         sp,8               ; remove the pushed parameters from the stack
    mov         dword ptr [i], ax  ; save return value - always in ax

Different compilers can use different conventions about passing parameters in registers, but I think the above is the basics of calls in C (using cdecl).

Paul Ogilvie
  • 25,048
  • 4
  • 23
  • 41