I'm reading "Computer Systems: A Programmer's Perspective, 3/E" (CS:APP3e) and the following code is an example from the book:
long call_proc() {
long x1 = 1;
int x2 = 2;
short x3 = 3;
char x4 = 4;
proc(x1, &x1, x2, &x2, x3, &x3, x4, &x4);
return (x1+x2)*(x3-x4);
}
The book gives the assembly code generated by GCC:
long call_proc()
call_proc:
; Set up arguments to proc
subq $32, %rsp ; Allocate 32-byte stack frame
movq $1, 24(%rsp) ; Store 1 in &x1
movl $2, 20(%rsp) ; Store 2 in &x2
movw $3, 18(%rsp) ; Store 3 in &x3
movb $4, 17(%rsp) ; Store 4 in &x4
leaq 17(%rsp), %rax ; Create &x4
movq %rax, 8(%rsp) ; Store &x4 as argument 8
movl $4, (%rsp) ; Store 4 as argument 7
leaq 18(%rsp), %r9 ; Pass &x3 as argument 6
movl $3, %r8d ; Pass 3 as argument 5
leaq 20(%rsp), %rcx ; Pass &x2 as argument 4
movl $2, %edx ; Pass 2 as argument 3
leaq 24(%rsp), %rsi ; Pass &x1 as argument 2
movl $1, %edi ; Pass 1 as argument 1
; Call proc
call proc
; Retrieve changes to memory
movslq 20(%rsp), %rdx ; Get x2 and convert to long
addq 24(%rsp), %rdx ; Compute x1+x2
movswl 18(%rsp), %eax ; Get x3 and convert to int
movsbl 17(%rsp), %ecx ; Get x4 and convert to int
subl %ecx, %eax ; Compute x3-x4
cltq ; Convert to long
imulq %rdx, %rax ; Compute (x1+x2) * (x3-x4)
addq $32, %rsp ; Deallocate stack frame
ret ; Return
I can understand this code: the compiler allocates 32 bytes of space on the stack, of which the first 16 bytes hold the arguments passed to proc
and the last 16 bytes hold 4 local variables.
Then I tested this code on GCC 11.2, using the optimization flag -Og
, and got this assembly code:
call_proc():
subq $24, %rsp
movq $1, 8(%rsp)
movl $2, 4(%rsp)
movw $3, 2(%rsp)
movb $4, 1(%rsp)
leaq 1(%rsp), %rax
pushq %rax
pushq $4
leaq 18(%rsp), %r9
movl $3, %r8d
leaq 20(%rsp), %rcx
movl $2, %edx
leaq 24(%rsp), %rsi
movl $1, %edi
call proc(long, long*, int, int*, short, short*, char, char*)
movslq 20(%rsp), %rax
addq 24(%rsp), %rax
movswl 18(%rsp), %edx
movsbl 17(%rsp), %ecx
subl %ecx, %edx
movslq %edx, %rdx
imulq %rdx, %rax
addq $40, %rsp
ret
I noticed that gcc first allocated 24 bytes for 4 local variables. Then it uses pushq
to add 2 arguments to the stack, so the final code uses addq $40, %rsp
to free stack space.
Compared to the code in the book, GCC allocates 8 more bytes of space here, and it doesn't seem to use the extra space. Why does it need the extra space?