5

My hello & regards to all. I have a C program, basically wrote for testing Buffer overflow.

    #include<stdio.h>
    void display()
    {
            char buff[8];
            gets(buff);
            puts(buff);
    }
    main()
    {
        display();
        return(0);
    }

Now i disassemble display and main sections of it using GDB. The code:-

Dump of assembler code for function main:

    0x080484ae <+0>:    push   %ebp        # saving ebp to stack
    0x080484af <+1>:    mov    %esp,%ebp   # saving esp in ebp
    0x080484b1 <+3>:    call   0x8048474 <display>   # calling display function
    0x080484b6 <+8>:    mov    $0x0,%eax   # move 0 into eax , but WHY ????
    0x080484bb <+13>:   pop    %ebp        # remove ebp from stack
    0x080484bc <+14>:   ret                # return

End of assembler dump.

Dump of assembler code for function display:

    0x08048474 <+0>:    push   %ebp          #saves ebp to stack        
    0x08048475 <+1>:    mov    %esp,%ebp     # saves esp to ebp
    0x08048477 <+3>:    sub    $0x10,%esp    # making 16 bytes space in stack
    0x0804847a <+6>:    mov    %gs:0x14,%eax  # what does it mean ????
    0x08048480 <+12>:   mov    %eax,-0x4(%ebp) # move eax contents to 4 bytes lower in stack
    0x08048483 <+15>:   xor    %eax,%eax       # xor eax with itself (but WHY??)
    0x08048485 <+17>:   lea    -0xc(%ebp),%eax  #Load effective address of 12 bytes 
                                              lower placed value ( WHY???? )

    0x08048488 <+20>:   mov    %eax,(%esp)      #make esp point to the address inside of eax
    0x0804848b <+23>:   call   0x8048374 <gets@plt>  # calling get, what is "@plt" ????
    0x08048490 <+28>:   lea    -0xc(%ebp),%eax       # LEA of 12 bytes lower to eax
    0x08048493 <+31>:   mov    %eax,(%esp)         # make esp point to eax contained address
    0x08048496 <+34>:   call   0x80483a4 <puts@plt>  # again what is "@plt" ????
    0x0804849b <+39>:   mov    -0x4(%ebp),%eax    # move (ebp - 4) location's contents to eax
    0x0804849e <+42>:   xor    %gs:0x14,%eax         # # again what is this ????
    0x080484a5 <+49>:   je     0x80484ac <display+56> # Not known to me
    0x080484a7 <+51>:   call   0x8048394 <__stack_chk_fail@plt>  # not known to me
    0x080484ac <+56>:   leave                        # a new instruction, not known to me
    0x080484ad <+57>:   ret                          # return to MAIN's next instruction

End of assembler dump.

So folks, you should consider my homework. Rest all of the code is known to me, except few lines. I have included a big "WHY ????" and some more questions in the comments ahead of each line. The first hurdle for me is "mov %gs:0x14,%eax" instruction, I cant make flow chart after this instruction. Somebody plz explain me, what these few instructions are meant for and doing what in the program? Thanks...

starblue
  • 55,348
  • 14
  • 97
  • 151
kriss
  • 171
  • 1
  • 1
  • 9
  • 2
    In main you return 0, that's why `mov $0x0,%eax`. – Qiau Sep 02 '12 at 09:31
  • 2
    `xor %eax,%eax` is an performance efficient way to clear %eax, since xor-ing the same value always yields 0. – Qiau Sep 02 '12 at 09:34
  • 5
    The manipulation of `gs:0x14` looks like a [stack canary](http://en.wikipedia.org/wiki/Stack_buffer_overflow#Stack_canaries). `xor %eax, %eax` is simply a way of setting `eax` to `0`. `lea -0xc(%ebp), %eax` loads the address of your `buff` into `eax`, so it can be passed into `gets/puts`. – DCoder Sep 02 '12 at 09:34
  • thanks alot Qiau and DCoder... :-) – kriss Sep 02 '12 at 10:14
  • 2
    The PLT is the Procedure Linkage Table. `gets` and `puts` are at a dynamic library, its addresses are not known when linking the program. So, later, each time you load the program, the dynamic libraries it needs are also loaded. Either at that time or later, the first time you call a function from a dynamic library, its address is resolved, and put into the GOT (Global Offset Table). When you call `gets@plt` it jumps indirectly to the entry pointed to by the corresponding GOT entry (or to a routine to resolve that address, if still unresolved). – ninjalj Sep 02 '12 at 11:08
  • Also note that `gets` should __never__ be used. If it receives more bytes than `buff` has size for, it will cause a buffer overflow. – ninjalj Sep 02 '12 at 11:09
  • xor %eax,%eax zeros out %eax because the canary value is has been moved into place on the stack and doesn't need to be in register any more – wheresmycookie Jul 10 '13 at 03:29
  • Didn't find this information easely on the web, @ninjalj. *Really* needed to know what `@PLT` means on a symbol. Thanks a lot. :) – paulotorrens Mar 06 '14 at 17:59
  • [Why is the gets function so dangerous that it should not be used?](https://stackoverflow.com/q/1694036/995714), [What is the best way to set a register to zero in x86 assembly: xor, mov or and?](https://stackoverflow.com/q/33666617/995714) – phuclv Mar 14 '19 at 14:57

4 Answers4

13
0x080484b6 <+8>:    mov    $0x0,%eax   # move 0 into eax , but WHY ????

Don't you have this?:

return(0);

They are probably related. :)

0x0804847a <+6>:    mov    %gs:0x14,%eax  # what does it mean ????

It means reading 4 bytes into eax from memory at address gs:0x14. gs is a segment register. Most likely thread-local storage (AKA TLS) is referenced through this register.

0x08048483 <+15>:   xor    %eax,%eax       # xor eax with itself (but WHY??)

Don't know. Could be optimization-related.

0x08048485 <+17>:   lea    -0xc(%ebp),%eax  #Load effective address of 12 bytes 
                                          lower placed value ( WHY???? )

It makes eax point to a local variable that lives on the stack. sub $0x10,%esp allocated some space for them.

0x08048488 <+20>:   mov    %eax,(%esp)      #make esp point to the address inside of eax

Wrong. It writes eax to the stack, to the stack top. It will be passed as an on-stack argument to the called function:

0x0804848b <+23>:   call   0x8048374 <gets@plt>  # calling get, what is "@plt" ????

I don't know. Could be some name mangling.

By now you should've guessed what local variable that was. buff, what else could it be?

0x080484ac <+56>:   leave                        # a new instruction, not known to me

Why don't you look it up in the CPU manual?

Now, I can probably explain you the gs/TLS thing...

0x08048474 <+0>:    push   %ebp          #saves ebp to stack        
0x08048475 <+1>:    mov    %esp,%ebp     # saves esp to ebp
0x08048477 <+3>:    sub    $0x10,%esp    # making 16 bytes space in stack
0x0804847a <+6>:    mov    %gs:0x14,%eax  # what does it mean ????
0x08048480 <+12>:   mov    %eax,-0x4(%ebp) # move eax contents to 4 bytes lower in stack
...
0x0804849b <+39>:   mov    -0x4(%ebp),%eax    # move (ebp - 4) location's contents to eax
0x0804849e <+42>:   xor    %gs:0x14,%eax         # # again what is this ????
0x080484a5 <+49>:   je     0x80484ac <display+56> # Not known to me
0x080484a7 <+51>:   call   0x8048394 <__stack_chk_fail@plt>  # not known to me
0x080484ac <+56>

So, this code takes a value from the TLS (at gs:0x14) and stores it right below the saved ebp value (at ebp-4). Then there's your stuff with get() and put(). Then this code checks whether the copy of the value from the TLS is unchanged. xor %gs:0x14,%eax does the compare.

If XORed values are the same, the result of the XOR is 0 and flags.zf is 1. Else, the result isn't 0 and flags.zf is 0.

je 0x80484ac <display+56> checks flags.zf and skips call 0x8048394 <__stack_chk_fail@plt> if flags.zf = 1. IOW, this call is skipped if the copy of the value from the TLS is unchanged.

What is that all about? That's a way to try to catch a buffer overflow. If you write beyond the end of the buffer, you will overwrite that value copied from the TLS to the stack.

Why do we take this value from the TLS, why not just a constant, hard-coded value? We probably want to use different, non-hard-coded values to catch overflows more often (and so the value in the TLS will change from a run to another run of your program and it will be different in different threads of your program). That also lowers chances of successfully exploiting the buffer overflow by an attacker if the value is chosen randomly each time your program runs.

Finally, if the copy of the value is found to have been overwritten due to a buffer overflow, call 0x8048394 <__stack_chk_fail@plt> will call a special function dedicated to doing whatever's necessary, e.g. reporting a problem and terminating the program.

Alexey Frunze
  • 61,140
  • 12
  • 83
  • 180
  • Thanks Alexy, I wish i could give you more than one "useful post". Thanks alot for such a nice explanation. – kriss Sep 02 '12 at 10:29
  • Alexy! I have a doubt. During the instructions:- call 0x8048374 ;; lea -0xc(%ebp),%eax ;; mov %eax,(%esp) , Why is it done? I mean the INPUTed value is already on the stack. Then why program loads LEA to eax , and then move the eax value to stack. The input is already there then why this process again? – kriss Sep 02 '12 at 10:56
  • The pointer to the array, which you pass as an argument to `gets()` and `puts()`, can be changed by `gets()` any way it wants. The calling code makes sure the argument of `puts()` is correct. – Alexey Frunze Sep 02 '12 at 17:13
9
0x0804849e <+42>:   xor    %gs:0x14,%eax         # # again what is this ????
0x080484a5 <+49>:   je     0x80484ac <display+56> # Not known to me
0x080484a7 <+51>:   call   0x8048394 <__stack_chk_fail@plt>  # not known to me
0x080484ac <+56>:   leave                        # a new instruction, not known to me
0x080484ad <+57>:   ret                          # return to MAIN's next instruction

The gs segment can be used for thread local storage. E.g. it's used for errno, so that each thread in a multi-threaded program effectively has its own errno variable.

The function name above is a big clue. This must be a stack canary.

(leave is some CISC instruction that does everything you need to do before the actual ret. I don't know the details).

sourcejedi
  • 3,051
  • 2
  • 24
  • 42
  • thanks friends, it is getting late for me to pay regards to you all for replying. I am beginner in assembly. So I am googling for the new terms used in your answers. – kriss Sep 02 '12 at 10:09
  • +1 for the term "stack canary" as well as the rest of the explanation. good info, thanks – jcomeau_ictx Apr 09 '13 at 05:20
4

Others already explained the GS thing (has to do with threads)..

0x08048483 <+15>:   xor    %eax,%eax       # xor eax with itself (but WHY??)

Explaining this requires some history of the X86 architecture:

the xor eax, eax instruction clears out all bits in register eax (loads a zero), but as you've already found it this seems to be unnecessary because the register gets loaded with a new value in the next instruction.

However, xor eax, eax does something else on the x86 as well. You probably know that you are able to access parts of the register eax by using al, ah and ax. It has been that way since the 386, and it was okay back then when eax really was a single register.

However, this is no more. The registers that you see and use in your code are just placeholders. Inside the CPU is working with much more internal registers and a completely different instruction set. Instructions that you write are translated into this internal instruction set.

If you use AL, AH and EAX for example you are using three different registers from the CPU point of view.

Now if you access EAX after you have used AL or AH, the CPU has to merge back these different registers to build a valid EAX value.

The line:

0x08048483 <+15>:   xor    %eax,%eax       # xor eax with itself (but WHY??)

Does not only clear out register eax. It also tells the CPU that all renamed sub-registers: AL, AH and AX can now considered to be invalidated (set to zero) and the CPU does not have to do any sub-register merging.

Why is the compiler emitting this instruction?

Because the compiler does not know in which context display() will get called. You may call it from a piece of code that does lots of byte arithmetic using AL and AH. If it would not clear out the EAX register via XOR, the CPU would have to do the costly register merging which takes a lot of cycles.

So doing this extra work at the function start improves performance. It is unnecessary in your case, but since the compiler can't know that emits the instruction to be sure.

Nils Pipenbrinck
  • 83,631
  • 31
  • 151
  • 221
2

The stack_check_fail is part of gcc buffer overflow check. It uses libssp (stack-smash-protection), and your move at the beginning sets up a guard for the stack, and the xor %gs:0x14... is a check if the guard is still ok. When it is ok, it jumps to the leave (check assembler doc for it, its an helper instruction for stack handling) and skips the jump to the stack_chk_fail, which would abort the program and emit an error message.

You can disable the emitting of this overflow check with the gcc option -fno-stack-protector.

And as already mentioned in the comments, the xor x,x is just a quick command to clear x, and the final mov 0, %eax is for the return value of your main.

Totem
  • 7,189
  • 5
  • 39
  • 66
flolo
  • 15,148
  • 4
  • 32
  • 57