arr[i];
as a C statement compiles to zero instructions, even in a debug build, since arr
isn't volatile
. All the asm you're seeing is just locals at fixed locations.
If you're used to looking at AT&T syntax, use the output dropdown on Godbolt to uncheck "Intel Syntax".
Also, you could get useful asm to look at by writing a function that takes a pointer arg and sums it for example, so you can enable light optimization without having the array optimize away to just returning a constant. You could use volatile
to force it not to if you really want to see asm that stores array initializers to the stack instead of just taking a function arg. You're writing this to look at the asm, not run it, so don't write a main()
. (In general see How to remove "noise" from GCC/clang assembly output?)
int sumarr(int *arr){
register int sum = 0; // register keyword is useful in GCC if you're going to disable optimization
for (int i=0 ; i<1024 ; i++){
sum += arr[i];
}
return sum;
}
Godbolt - gcc 12 with -Og
uses an indexed addressing mode like you expect; most other optimization levels do a pointer increment.
# gcc -Og
sumarr:
movl $0, %eax
movl $0, %edx
jmp .L2 # jump to the loop condition, redundant because 0 <= 1023 is known true; could just fall through into the loop, but -Og maybe intentionally preserves that execution? It doesn't in general give consistent debugging.
.L3: # do {
movslq %eax, %rcx # sign-extend int to intptr_t since -Og doesn't optimize much
addl (%rdi,%rcx,4), %edx # edx += *(rdi + rcx*4) = array[i]
addl $1, %eax
.L2: # loop entry point for first iteration
cmpl $1023, %eax
jle .L3 # }while(i<=1023)
movl %edx, %eax # at low optimization levels, GCC didn't sum into the return-value register in the first place.
ret
With a normal level of optimization like -O2
, we get a pointer increment:
# GCC -O2 -fno-tree-vectorize (-O2 by itself would vectorize with SIMD)
sumarr:
leaq 4096(%rdi), %rdx # endp = ptr + len
xorl %eax, %eax # sum = 0
.L2: # do {
addl (%rdi), %eax
addq $4, %rdi
cmpq %rdx, %rdi
jne .L2 # }while(p != endp)
ret
The amount of memory reserved for the local variables is always a multiple of 16 bytes, to keep the stack aligned to 16 bytes. in reality it should be 0 but it needs to be a multiple of 16?
That's an over-simplification, but yes in this case 0*16 = 0
is a multiple of 16 which maintains stack alignment for RSP after push rbp
. The actual space it uses is in the red zone, 128 bytes below RSP.
The amount of stack space reserved for locals is always a multiple of 8, either odd or even depending on how many pushes it did, so the total movement of RSP is 16*n + 8
.
In this case where it doesn't need to preserve any call-preserved registers like RBX, it only pushed RBP (because -fno-omit-frame-pointer
is the default at -O0
.)
It doesn't sub rsp, 16
because the x86-64 SysV ABI includes a red-zone; the 128 bytes below RSP are safe from signal handlers or anything stepping on them asynchronously, so can be used without "reserving". Use -mno-red-zone
to stop GCC doing that, if you want to see it reserve space for an array.