15

I'm self-studying how compilers works. I'm learning by reading the disassembly of GCC generated code from small 64-bit Linux programs.

I wrote this C program:

#include <stdio.h>

int main()
{
    for(int i=0;i<10;i++){
        int k=0;
    }
}

After using objdump I get:

00000000004004d6 <main>:
  4004d6:       55                      push   rbp
  4004d7:       48 89 e5                mov    rbp,rsp
  4004da:       c7 45 f8 00 00 00 00    mov    DWORD PTR [rbp-0x8],0x0
  4004e1:       eb 0b                   jmp    4004ee <main+0x18>
  4004e3:       c7 45 fc 00 00 00 00    mov    DWORD PTR [rbp-0x4],0x0
  4004ea:       83 45 f8 01             add    DWORD PTR [rbp-0x8],0x1
  4004ee:       83 7d f8 09             cmp    DWORD PTR [rbp-0x8],0x9
  4004f2:       7e ef                   jle    4004e3 <main+0xd>
  4004f4:       b8 00 00 00 00          mov    eax,0x0
  4004f9:       5d                      pop    rbp
  4004fa:       c3                      ret    
  4004fb:       0f 1f 44 00 00          nop    DWORD PTR [rax+rax*1+0x0]

Now I have some doubts.

  1. What is that NOP at the end for, and why is it there? (alignment?)

  2. I'm compiling with gcc -Wall <program.c>. Why am I not getting the warning control reaches end of non-void function?

  3. Why doesn't the compiler allocate space on the stack with sub rsp,0x10? Why doesn't it use the rbp register for referencing local stack data?

    PS: If I call a function (like printf) in the for loop, why does the compiler suddenly generate sub rsp,0x10? Why does it still references local data with the rsp register. I expect the generated code to reference local stack data with rbp!

Hatted Rooster
  • 35,759
  • 6
  • 62
  • 122
Ofey
  • 167
  • 7
  • 11
    a little tip: if you want to play around with compilers check out : https://godbolt.org/ – AndersK Mar 24 '17 at 08:36
  • 5
    None of the answers really captures why #3 occurs. You will find this in Linux 64-bit code. The compiler is taking advantage of the [Red Zone](https://en.wikipedia.org/wiki/Red_zone_(computing)). You can find information of this red zone in the [64-bit System V Linux ABI](https://www.uclibc.org/docs/psABI-x86_64.pdf) . In that copy of the document review _section 3.2.2: Stack Frame_ .Since your function is a leaf function (doesn't call other functions) it can take advantage of the fact that 128 bytes below the current _RSP_ won't be clobbered by signal handling etc. – Michael Petch Mar 24 '17 at 09:46
  • 2
    Because the 128 bytes below the current value of _RSP_ (in leaf functions) are safe there is no need to adjust _RSP_ if the functions stack based data can fit in that 128 bytes. That is why you don't see _RSP_ adjusted here. One observation. The Red Zone doesn't exist in 32-bit Linux, so compiling as 32-bit code you should see the behaviour you might have expected. – Michael Petch Mar 24 '17 at 09:58
  • I see “_suggested edit queue is full_”; I hope someone fixes: ► title should mention gcc; ► gcc tag?; ► “disassemblying” → “disassembling”; ► “write” → “generates”; ► “use” → “uses”; ► “like is it's” → “as if it were”; ► “for” → “`for`”; ► “nop” → “`nop`”; ► “PS” → further indented under #3. – PJTraill Mar 24 '17 at 10:08
  • 1
    As for the question as to why when you add the `test` function the compiler subtract 16 bytes although less is needed is related to 64-bit Linux ABI requirement that at the point of any function call (like calling `test`) the stack needs to be 16 (and possibly 32-byte aligned). Since the stack is aligned at the point of the call, the CALL itself pushes 8 byte return address. That misaligns the stack by 8. The `push rbp` subtracts another 8 which makes it 16 byte aligned again. Now you need local variable data. Compilers usually allocate enough bytes to maintain the 16-byte alignment – Michael Petch Mar 24 '17 at 10:54
  • 1
    By subtracting 16-bytes, by the time it reaches the call to `test` the stack remains 16-byte aligned and all is good.. – Michael Petch Mar 24 '17 at 10:55
  • 1
    By subtracting 16-bytes, by the time it reaches the call to test the stack remains 16-byte aligned and all is good.. From the ABI I linked to in my first comment the rule is _The end of the input argument area shall be aligned on a 16 (32, if __m256 is passed on stack) byte boundary. In other words, the value (%rsp + 8) is always a multiple of 16 (32) when control is transferred to the function entry point_ – Michael Petch Mar 24 '17 at 11:02
  • 1
    @PJTraill “title should mention gcc” — **no**. That’s what *tags* are for. [Don’t put tags in the title](https://meta.stackexchange.com/a/130208/1968). – Konrad Rudolph Mar 24 '17 at 12:53
  • compiling and disassembling is better IMO than compiling to asm. but using objdump you do end up disassembling data, so the bytes after ret are possibly padding to align the whole thing or perhaps there as a real nop for branch shadow (pipeline) reasons – old_timer Mar 24 '17 at 13:32
  • @KonradRudolph: thanks for the comment and link, which I shall look at; I suspect you are right – it was my first idea to make the hopelessly vague title express the content of the question more specifically. One limitation of tags: they don’t show up under _Hot Network Questions_, making such a title worse. – PJTraill Mar 24 '17 at 20:04
  • 1
    Please see also my question and its response at http://stackoverflow.com/questions/43013693/compiler-using-local-variables-without-adjusting-sp#43014138. It explains a bit more. – Paul Ogilvie Mar 25 '17 at 08:38
  • @PaulOgilvie Thank you! – Ofey Mar 25 '17 at 11:43

3 Answers3

12

Regarding the second question, since the C99 standard it's allowed to not have an explicit return 0 in the main function, the compiler will add it implicitly. Note that this is only for the main function, no other function.

As for the third question, the rbp register acts as the frame pointer.

Lastly the PS. It's likely that the called function is using 16 bytes (0x10) for the arguments passed to the function. The subtraction is what "removes" those variables from the stack. Could it possibly be two pointers you pass as arguments?

If you're serious learning how compilers in general works, and possibly want to create your own (it's fun! :)), then I suggest you invest in some books about the theory and practice of it. The dragon book is an excellent addition to any programmers bookshelf.

Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
  • 1
    plus 1/3 in integer arithmetic – Bathsheba Mar 24 '17 at 08:02
  • 1
    @Bathsheba I didn't even expected that much! :) – Some programmer dude Mar 24 '17 at 08:04
  • It's an excellent answer to (2). How about if we see 3 "plus 1/3" comments, then one of us will upvote? – Bathsheba Mar 24 '17 at 08:05
  • plus 1/3 from me too then. – Ajay Brahmakshatriya Mar 24 '17 at 08:07
  • Also plus 1 for *create your own (it's fun! :))* – Ajay Brahmakshatriya Mar 24 '17 at 08:07
  • 2
    Dunnit. Perhaps "why can't we have a rational type for the number of votes" is a meta question? – Bathsheba Mar 24 '17 at 08:08
  • Hey guys, don't forget Paul's answer, it answers all three questions. :) – Some programmer dude Mar 24 '17 at 08:09
  • 1
    Because we don't want the upvotes to go from 3 to 2.99999999997 – Ajay Brahmakshatriya Mar 24 '17 at 08:10
  • 3
    [What is exactly the base pointer and stack pointer? To what do they point?](http://stackoverflow.com/q/1395591/995714), [What is the purpose of the EBP frame pointer register?](http://stackoverflow.com/q/579262/995714) – phuclv Mar 24 '17 at 08:22
  • Thank you, but I think that sub **add** space to the stack(I'm wrong?), if I call a custom function `void test(){}` it always allocate space on the stack, but 16 byte it's too much(I think). Ideally it only have to put the pointer of the function on the stack then calling it, but the pointer is only 8 bytes, that's why I don't fully uderstand the `sub rsp,0x10` – Ofey Mar 24 '17 at 08:58
  • 1
    @Ofrey : By adding a calling to another function the code can't take advantage of the red zone (a function that calls another function is no longer a leaf function - see my commnt under your question). That is why you see GCC generate `sub rsp,0x10` appear when you add the call to `test` – Michael Petch Mar 24 '17 at 10:37
  • It could be interresting to read my question and its response at http://stackoverflow.com/questions/43013693/compiler-using-local-variables-without-adjusting-sp#43014138 – Paul Ogilvie Mar 25 '17 at 08:36
6

Anything after the ret cannot be relied on to be code. Decoding as nop means "No OPeration"

The 2nd point is the compiler detecting you leave the main function without returning a value and it inserts a return 0 (only defined for main).

The rbp register, with bp meaning "Base Pointer", points to the stack frame of the currect function. A function call often results in the function entry saving rbp and using the current value of rsp for rbp. Fetching/storing function arguments and local variables are done relative to rbp.


I think your third question needs some more attention, "Why doesn't the compiler allocate space on the stack with sub rsp,0x10? Why doesn't it use the rbp register for referencing local stack data?"

Actually, the compiler does allocate space on the stack. But it does not change the stackpointer. It can do that because the functon calls no other functions. It just uses space below the curent sp (the stack grows down) and it uses rbp to access i ([rbp-0x8]) and k ([rbp-0x4]).


I must add the following note: not adjusting sp for the use of local variables seems not interrupt safe and so the compiler relies on the hardware automatically switching to a system stack when interrupts occur. Otherwise, the first interrupt that came along would push the instruction pointer onto the stack and would overwrite the local variable.

Question of interrupts solved in Compiler using local variables without adjusting RSP

Community
  • 1
  • 1
Paul Ogilvie
  • 25,048
  • 4
  • 23
  • 41
  • After the ret call, then the compiler just throw random opcode? – Ofey Mar 24 '17 at 09:00
  • 5
    @Ofey : What hasn't been captured by any of the answers here is the why that NOP exists. In most cases when it is after a function (after the `ret` in this case), it has been placed there so that the next function starts on a 16-byte boundary. Take the address at the start of the NOP (0x4004fb in your objdump) and add the 5 bytes the NOP takes. You get 0x400500. 0x400500 is evenly divisible by 16.This is done for performance reasons (related to cache lines).Likely in your objdump you have a function that appears right after `main`.You'll see NOPs in a function to align loops like this as well. – Michael Petch Mar 24 '17 at 09:19
  • @MichaelPetch Thank you! Usually when I see `nop` I always think about alignment, in this case I wasn't able to understand where to align. Thank you again! – Ofey Mar 24 '17 at 09:38
  • I also used `nop`s to patch an executable. – Paul Ogilvie Mar 24 '17 at 09:39
  • @PaulOgilvie Me also,during some crackme challenges. – Ofey Mar 24 '17 at 09:51
6
  1. Yes, the nop is for alignment. Compilers use different instructions for different lengths of padding needed, knowing that modern CPU will be pre-fetching and decoding several instructions ahead.

  2. As others have said, the C99 standard returns 0 from main() by default if there's no explicit return statement (see 5.1.2.2.3 in C99 TC3), so no warning is raised.

  3. The 64-bit System V Linux ABI reserves a 128-byte "red zone" below the current stack pointer that leaf functions (functions that do not call any other functions - and your main() is one such) can use for local variables and other scratch values without having to sub rsp / add rsp. And so rbp == rsp.

And for the PS: when you call a function in the for() loop (or anywhere in your main()), main() is no longer a leaf function, so the compiler can no longer use the red zone. That's why the it allocates space on the stack with sub rsp, 0x10. However, it knows the relationship between rsp and rbp, so it can use either when accessing data.

user7761803
  • 209
  • 2
  • 9