The callee doesn't need to pop them (and shouldn't), just load them. You can leave them in memory until you're ready for them, there's no need to save/restore more call-preserved registers for them unless you actually want to have more data live in registers at once. e.g.
int foo(int a, int b, int c, int d, int e, int f) {
return a+b+c+d+e+f;
}
compiles with ARM GCC -O3 (https://godbolt.org/z/v6v6ch4vf)
foo(int, int, int, int, int, int):
add r0, r0, r1 @ a+=b
ldr r1, [sp] @ load e into r1, since b is now dead
add r0, r0, r2 @ a+=c
ldr r2, [sp, #4] @ load f since c is now dead.
add r0, r0, r3
add r0, r0, r1
add r0, r0, r2
bx lr
Scheduling those loads ASAP let the compiler hide some load-use latency by doing each load as far ahead of the use of the load result as possible.
r0..r3 and r121 are the only call-clobbered registers in the version of the standard AAPCS calling convention GCC is using; see What registers to save in the ARM C calling convention? Well, LR is call-clobbered, but if we did clobber it we'd have to save/restore the return address.
The stack pointer is call-preserved; it must be the same when control returns to the caller as it was on entry to the function. (In other words, stack args if any are caller-pops.)
So if you did actually pop some stack args, you'd need to sub sp, #8
at some point before returning to restore it. (e.g. if you were done with r0..r3 and could pop into some of those registers).
Footnote 1: I don't know why GCC isn't doing an earlier load into r12 (ip
); even with -marm
to make sure it's targeting ARM mode, it doesn't use it. (In Thumb mode, the high 8 registers are not usable with many instructions). GCC doesn't think it's call-preserved: even forcing it to copy a value to that register with register int y asm("r12")
and an asm
statement doesn't make it save/restore that register. https://godbolt.org/z/GMhbMscqT