3

I'm reading Programming from the Ground Up. pdf address: http://mirror.ossplanet.net/nongnu/pgubook/ProgrammingGroundUp-0-8.pdf

I'm curious about Page37's reserve space for local variables. He said, we need to 2 words of memory, so move stack pointer down 2 words. execute this instruction: subl $8, %esp so, here, I think I'm understand.

But, I write c code to verify this reserve space.

#include <stdio.h>

int test(int a1, int a2, int a3, int a4, int a5, int a6, int a7, int a8, int a9, int a10, int a11, int a12) {
    printf("a1=%#x, a2=%#x, a3=%#x, a4=%#x, a5=%#x, a6=%#x, a7=%#x, a8=%#x, a9=%#x, a10=%#x, a11=%#x, a12=%#x", a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12);

    return 0;
}

int main(void){
    test(0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x10, 0x11, 0x12);
    printf("Wick is me!");

    return 0;
}

then, I use gcc convert to Executable file, gcc -Og -g, and use gdb debugger.

I use disass to main function, and copied some of the asm code in below.

   0x000055555555519d <+0>: endbr64 
   0x00005555555551a1 <+4>: sub    $0x8,%rsp  # reserve space?
   0x00005555555551a5 <+8>: pushq  $0x12
   0x00005555555551a7 <+10>:    pushq  $0x11
   0x00005555555551a9 <+12>:    pushq  $0x10
   0x00005555555551ab <+14>:    pushq  $0x9
   0x00005555555551ad <+16>:    pushq  $0x8
   0x00005555555551af <+18>:    pushq  $0x7
   0x00000000000011b1 <+20>:    mov    $0x6,%r9d
   0x00000000000011b7 <+26>:    mov    $0x5,%r8d
   0x00000000000011bd <+32>:    mov    $0x4,%ecx
   0x00000000000011c2 <+37>:    mov    $0x3,%edx
   0x00000000000011c7 <+42>:    mov    $0x2,%esi
   0x00000000000011cc <+47>:    mov    $0x1,%edi
   0x00000000000011d1 <+52>:    callq  0x1149 <test>
   0x00000000000011d6 <+57>:    add    $0x30,%rsp
   0x00000000000011da <+61>:    lea    0xe89(%rip),%rsi        # 0x206a
   0x00000000000011e1 <+68>:    mov    $0x1,%edi
   0x00000000000011e6 <+73>:    mov    $0x0,%eax
   0x00000000000011eb <+78>:    callq  0x1050 <__printf_chk@plt>
   0x00000000000011f0 <+83>:    mov    $0x0,%eax
   0x00000000000011f5 <+88>:    add    $0x8,%rsp
   0x00005555555551f9 <+92>:    retq

I'm dubious that this is reserve space instruction. then, I execute assembly code line by line and check content in the stack.

Why is this instruction only sub 8 byte, and 0x7fffffffe390 seems main function's return address. Should this not be reserve space?

below is rsp address nearby content. i r $rsp, x/40xb rsp address

0x7fffffffe390: 0x00    0x52    0x55    0x55    0x55    0x55    0x00    0x00   => after sub
0x7fffffffe398: 0xb3    0x20    0xdf    0xf7    0xff    0x7f    0x00    0x00   => before sub

then, I execute all pushq instruction, and use x/64xb 0x7fffffffe360.

0x7fffffffe360: 0x07    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7fffffffe368: 0x08    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7fffffffe370: 0x09    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7fffffffe378: 0x10    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7fffffffe380: 0x11    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7fffffffe388: 0x12    0x00    0x00    0x00    0x00    0x00    0x00    0x00

above is local variables
==========================

0x7fffffffe390: 0x00    0x52    0x55    0x55    0x55    0x55    0x00    0x00
0x7fffffffe398: 0xb3    0x20    0xdf    0xf7    0xff    0x7f    0x00    0x00

I think 0x7fffffffe390~0x7fffffffe398 is reserve space for local variables, but it no change! Is my test way wrong?

Execution environment:

  • GDB version: 9.2
  • GCC version: 9.4.0
  • os: x86_64 GNU/Linux
OnlyWick
  • 342
  • 2
  • 10
  • 3
    Maybe to align stack to meet requirement for function call? – MikeCAT Jun 14 '22 at 12:29
  • Does compiling with `-m32` make a difference? – Ted Lyngmo Jun 14 '22 at 12:31
  • You compiled with optimization enabled, and your function args aren't `volatile`, so GCC didn't spill them all to RAM. So they stayed in their incoming registers and stack space. The only stack-pointer adjustment here is to get 16-byte alignment before the `call`, as @MikeCAT said. – Peter Cordes Jun 14 '22 at 13:07
  • 4
    BTW, since you're following the PGU book, you probably want to use `-m32`. When PGU was written, CPUs didn't decode `push` / `pop` as efficiently, so it was common for compilers to allocate space once for outgoing args (`-maccumulate-args`) and use `mov` stores, instead of `push`, to pass stack args. So `-m32` alone on a modern CPU won't change the fact that there isn't a big `sub $..., %esp` (@TedLyngmo), only enough for 16-byte alignment. – Peter Cordes Jun 14 '22 at 13:10
  • @PeterCordes Thanks! Close, but no cigar. :-) – Ted Lyngmo Jun 14 '22 at 13:16
  • @PeterCordes Why alignment before the call? How should I do it? if I use `-m32` after affect me learn x86_64? I am rookie for hardware and asm, I just wanna learn computer basic knowledge. – OnlyWick Jun 14 '22 at 13:25
  • 2
    @OnlyWick: Because that's what the calling convention requires / guarantees. They had to pick before vs. after, and before means the stack args (if any) are aligned by 16. Or at least the first one is. ([Why does the x86-64 / AMD64 System V ABI mandate a 16 byte stack alignment?](https://stackoverflow.com/q/49391001)) – Peter Cordes Jun 14 '22 at 13:56
  • Learn the mode the book is for. Once you understand the basics of 32-bit mode, it's relatively easily to learn [what changes for 64-bit mode](https://web.archive.org/web/20160609221003/http://www.x86-64.org/documentation/assembly.html). Trying to port a tutorial on the fly is easy if you know 32-bit mode and 64-bit mode already. But if you actually need to learn what the tutorial was trying to teach in the first place, [you won't know which parts need to change and which don't!](https://stackoverflow.com/questions/1175375/how-does-an-os-affect-how-assembly-code-runs/72569455#72569455) – Peter Cordes Jun 14 '22 at 13:59
  • @PeterCordes Any situation aligned by 16? In my other case, appear `sub $0x8,%rsp`..these problems make me dizzy.. – OnlyWick Jun 14 '22 at 14:38
  • 1
    The calling convention only cares about the boundaries between functions (right before/after a call, and right before/after a ret). But if GCC moves the stack pointer at all after a function prologue (where it might push some call-preserved registers like RBX), it will make RSP a multiple of 16. Unless it's setting up for an odd number of pushes for a function call with some stack args, like here. It ended up combining both, or something, or it's just a missed optimization. – Peter Cordes Jun 14 '22 at 14:43
  • @PeterCordes I have a problem in mind(ask a question in advance..). Now, I use disass check asm code, so I see the sub $0x8,%rsp, but if I will write asm code, how should I know how many to subtract? – OnlyWick Jun 15 '22 at 10:56
  • Count the pushes you want to do (for args and/or saving registers), and if it's not odd, you need an extra 8 byte adjustment. Either a dummy push, or a sub from RSP. – Peter Cordes Jun 15 '22 at 11:27
  • 6x pushq is 6x8 = 48 bytes, an even multiple of 8. Like I said, an even number of pushes so you need one more. On function entry, RSP % 16 == 8, but before a call you need RSP % 16 == 0. (`call` itself changes RSP by 8.) – Peter Cordes Jun 15 '22 at 20:17
  • @PeterCordes Now assume has 5 pushq, then 5x8 = 40 bytes, an odd multiple of 8, so I don't need an extra 8 byte adjustment, but finally get the result: `sub 0x10, %rsp`... I seem to be trapped inside.. :-(, PGU seems to miss this knowledge, where should I find this? – OnlyWick Jun 16 '22 at 07:56
  • PSkocik's answer on this question already explains that the `sub $0x10, %rsp` is a missed optimization by GCC, not necessary for anything, assuming you're talking about calling a function that takes 5 stack args (11 total). That's the only "weird" thing here, the rest all follows from the calling convention / ABI requirements. – Peter Cordes Jun 16 '22 at 08:10
  • @PeterCordes So 5 pushq, I will write asm code that should use `sub $40, %rsp`?? Or not to use `sub` intruction..... – OnlyWick Jun 16 '22 at 08:21
  • What? If you use 5 pushes total, you don't need `sub` at all, unless you separately want other stack space for locals. Or if you want to use `mov` to store args, instead of `push`. – Peter Cordes Jun 16 '22 at 10:40

3 Answers3

2

The x86-64 SysV ABI requires that stacks be 16-aligned at the time of a call. Since a call instructions pushes an 8-byte return-address to the stack, the stack is always misaligned by 8 at the start of a function and if a nested call is to be made then the caller will need to have pushed an odd number of eight-bytes to the stack to make it 16-aligned again.

Since your function takes 12 integer arguments, 6 of which go to the stack as eight-bytes each, an extra 8-byte needs to be pushed to the stack before the stack arguments so the stack is 16-aligned before the call.

If your function took 11 arguments (or any other 6 (register arguments) +odd stack number of arguments), then no extra stack push should be needed.

Gcc and clang are still weirdly generating sub rsp, 16 (gcc) and push rax; sub rsp, 8; (clang) for that case (https://gcc.godbolt.org/z/jGj5WPq8c). I don't understand why.

Petr Skocik
  • 58,047
  • 6
  • 95
  • 142
  • 1
    I think the extra 16 bytes of space that GCC and clang are reserving is a missed-optimization bug. For GCC, there's a somewhat well-known bug: [Why does GCC allocate more space than necessary on the stack, beyond what's needed for alignment?](//stackoverflow.com/q/63009070) But this might be a different cause in the compiler internals since clang had the same problem. Especially strange that clang used one push and one sub; that's just the worst of both worlds. But perhaps revealing: like that it wanted to realign RSP on function entry, and then separately had to set up for an odd-args call – Peter Cordes Jun 14 '22 at 14:24
0

Update:

I misread -Og as -O0. With optimization on, there are several additional complications (such as how exactly GCC choses to pass arguments, whether it reserves space for locals at all or keeps these locals in a register, etc. etc.).

To understand what's going on, you should first understand the picture without optimizations.


Where is reserve space?

There are several ways to "reserve space" on stack on x86_64:

  • push ...
  • sub ...,%rsp
  • enter ...

There are also several ways to "unreserve" it: pop ..., add ...,%rsp, leave.

In your case, it's the pushq instruction which simultaneously puts a value into the stack slot and reserves space for that value.

You didn't show what happens just before retq, but I suspect that your "unreserve" looks something like add $68,%rsp.

P.S. You have a sequence of 0x01, 0x02 ..., 0x09, 0x10, .... Note that these are not consecutive numbers: the next number after 0x09 is 0x0a.

Employed Russian
  • 199,314
  • 34
  • 295
  • 362
  • I add asm details, book says: `the function reserves space on the stack for any local variables it needs. This is done by simply moving the stack pointer out of the way. `, Isn't is `0x7fffffffe390~0x7fffffffe398` for local variables? hha, sequence just for convenient, no hex.. – OnlyWick Jun 14 '22 at 13:07
  • @OnlyWick "The book says ..." -- what the book says is not _incorrect_, it's just _incomplete_. Like I said above, "moving stack pointer" is not the only way to reserve stack space on `x86_64`. It _is_ the only way on most RISC processors though. – Employed Russian Jun 14 '22 at 13:34
0

Recall that in x86_64, the call instruction does the following:

  1. push the current value of RIP, which is the next instruction that will be executed when the function returns. (which moves RSP down in memory - recall that in x86_64 the stack grows down, thus RBP > RSP).

  2. push the current value of RBP, which is used to help restore the caller's stack frame. (which moves RSP down again)

  3. move the current bottom pointer, RBP, to the current stack pointer, RSP. (effectively this creates a zero sized stack starting at where RSP is currently at)

Thus in the memory dump that you show:

0x7fffffffe390: 0x00    0x52    0x55    0x55    0x55    0x55    0x00    0x00
0x7fffffffe398: 0xb3    0x20    0xdf    0xf7    0xff    0x7f    0x00    0x00

The value at 0x7fffffffe390 is the address of the next function to be executed afer the return from main. This instruction is located at 0x0000555555555200 (remember that intel processor are little endian, so you have to read the value backwards). This memory address is consistent with the other memory values you've shown for the code.

Additionally, the bottom of the stack frame for main (RBP) is located at 0x7ffff7df20b3, which looks consistent with the other stack addresses you've shown.

As soon as the call to `main' is executed, you enter the preable of the function, which is the first three lines of the disassembly you have:

0x000055555555519d <+0>: endbr64 
0x00005555555551a1 <+4>: sub    $0x8,%rsp  # reserve space?
0x00005555555551a5 <+8>: pushq  $0x12

The second line sub $0x8, %rsp subtracts 0x8 from the stack pointer, thus forming a new stack from RBP->RSP. This space is the space reserved for local variables (and any other space that might be needed as the function executes.

Next we have a series of pushq's and mov's - and these all are doing the same thing. You need to recall that

  1. arguments to a function are evaluated right to left, thus the last argument to test is evaluated first

  2. the fist six arguments are passed in registers in 64-bit code, thus a1 -> a6 are passed in the register that you see.

  3. anything beyond six arguments are pushed on the stack, thus a7 -> a12 are pushed on the stack.

All of you arguments are literals, so there is no local variables and the values are used directly in the pushq's or mov's.

The next bit of assembly is

0x00000000000011d1 <+52>:    callq  0x1149 <test>
0x00000000000011d6 <+57>:    add    $0x30,%rsp
0x00000000000011da <+61>:    lea    0xe89(%rip),%rsi        # 0x206a
0x00000000000011e1 <+68>:    mov    $0x1,%edi
0x00000000000011e6 <+73>:    mov    $0x0,%eax

In this we see the actual call to test. The next instruction is to clean up the stack. Recall that we push 6 8-byte values on the stack, causing the stack to grow downwards by 48-bytes. Adding 0x30 (48 decimal) effectively removes thos 6 values from the stack by moving RSP upward.

The next two lines are setting up the parameters that are going to be passed to printf, the next line mov $0x0, %eax is clearing the EAX, which is where the return value from a function typically goes.

The last bit of assembly (memory address have change, I suspect that this is from a second run of the code):

0x00000000000011eb <+78>:    callq  0x1050 <__printf_chk@plt>
0x00000000000011f0 <+83>:    mov    $0x0,%eax
0x00000000000011f5 <+88>:    add    $0x8,%rsp
0x00005555555551f9 <+92>:    retq

performs that actual call to printf, then clears the return value (printf returns an int value with the number of characters printed), and finally the add $0x8, %rsp undoes the subtraction performed on line 2 of the disassembly, effectively destroying the stack frame for main. The last line retq is the return from main.

You are correct in that sub $0x8,%rsp is reserving 8 bytes for local variables (or intermediate values). However, main does not use any local variables, so nothing is going to change.

As a test, you could add a few local variables to main:

int a = 5, b = 10, c;
c = 3*a + 2*b;

printf("Wick is me %d\n", c);   // <--- note modification in this line

In this case you should see some modification to the value being subtracted from RSP in line 2. We would expect an additional 24 byte of stack space being needed, however it can be different for a few reasons

  1. The results of the calculations 3*a' and 2*b' need to be stored somewhere -- either on the stack or in registers.
  2. The value of a and b are literals and may be stored in registers.
  3. The compiles might be able to deduce that 3a + 2b is a constant and perform the math at compile time, optimize away both a' and b' and just set `c' to 35.

Using -O0 or -Og as well as using -m32 (forcing code for a 32-bit processor) might remove some of these issues.

thurizas
  • 2,473
  • 1
  • 14
  • 15
  • `call` doesn't touch RBP. It only pushes a return address and sets a new RIP. The stuff with RBP you mention takes extra instructions in the prologue of a function you call, if you choose to spend extra instructions setting up a legacy frame-pointer. (gcc -Og or higher won't.) Also, you have step (3) backwards. In AT&T syntax, it's `mov %rsp, %rbp`, setting RBP = RSP, not vice versa. (That's why in step 2 we save the old RBP, because we're about to modify it.) Or did you mean the English word move, as in change RBP so it's now equal to RSP? I assume so, after decoding your phrasing. – Peter Cordes Jun 14 '22 at 14:13