ARM calling convention for more than 4 params: conflicting information

Question

I'm trying to understand the ARM calling convention on Linux, under gcc, especially with more than 4 params. Stackoverflow, Wikipedia, and ARM docs have information which is sometimes in conflict.

I understand that additional params are pushed on the stack. Who takes them off the stack? I cannot find this discussed, although it seems the caller needs to.
The callee needs to eventually pop these to registers. But, it seems using the pop op is wrong, because, in order to pop to a reg, the callee must first push the reg to save it. So, instead, the callee needs to do something like:

@ This func takes 6 args
push {r4, r5, r6, lr} @ We don't use r6, but need to keep the stack 8 byte aligned
ldr r4, [sp, #0xc]
ldr r5, [sp, #8]
...
pop {lr, r6, r5, r4}
bx lr

Is that correct?

show the code that is being compiled and show the asm before you modified it. no you do not have to push a reg to pop it. pushing puts stuff on the stack, popping takes stuff off. you can simply modify the stack pointer address instead of pushing or popping as well. seeing the compiled code we can explain it better. also note that depending on the variable types you may have less than four parameters before the stack is in use — old_timer, Jan 14 '22 at 13:48

Peter Cordes · Answer 1 · 2022-01-19T16:13:00.573

The callee doesn't need to pop them (and shouldn't), just load them. You can leave them in memory until you're ready for them, there's no need to save/restore more call-preserved registers for them unless you actually want to have more data live in registers at once. e.g.

int foo(int a, int b, int c, int d, int e, int f) {
    return a+b+c+d+e+f;
}

compiles with ARM GCC -O3 (https://godbolt.org/z/v6v6ch4vf)

foo(int, int, int, int, int, int):
        add     r0, r0, r1       @ a+=b
        ldr     r1, [sp]         @ load e into r1, since b is now dead
        add     r0, r0, r2       @ a+=c
        ldr     r2, [sp, #4]     @ load f since c is now dead.
        add     r0, r0, r3
        add     r0, r0, r1
        add     r0, r0, r2
        bx      lr

Scheduling those loads ASAP let the compiler hide some load-use latency by doing each load as far ahead of the use of the load result as possible.

r0..r3 and r12¹ are the only call-clobbered registers in the version of the standard AAPCS calling convention GCC is using; see What registers to save in the ARM C calling convention? Well, LR is call-clobbered, but if we did clobber it we'd have to save/restore the return address.

The stack pointer is call-preserved; it must be the same when control returns to the caller as it was on entry to the function. (In other words, stack args if any are caller-pops.)

So if you did actually pop some stack args, you'd need to sub sp, #8 at some point before returning to restore it. (e.g. if you were done with r0..r3 and could pop into some of those registers).

Footnote 1: I don't know why GCC isn't doing an earlier load into r12 (ip); even with -marm to make sure it's targeting ARM mode, it doesn't use it. (In Thumb mode, the high 8 registers are not usable with many instructions). GCC doesn't think it's call-preserved: even forcing it to copy a value to that register with register int y asm("r12") and an asm statement doesn't make it save/restore that register. https://godbolt.org/z/GMhbMscqT

`r12` is also call clobbered. https://godbolt.org/z/f6f5oM6We — Timothy Baldwin, Jan 19 '22 at 16:03
@TimothyBaldwin: Thanks. That's a mystery, then; you'd think when tuning for an in-order ARM core, it would schedule an early load into R12. Unless its tuning heuristics are so set on Thumb mode that it doesn't like using R8..14? Probably nothing that simple. — Peter Cordes, Jan 19 '22 at 16:14

ARM calling convention for more than 4 params: conflicting information

1 Answers1