Use of LR and PC instructions in non-leaf and leaf functions epilogue

Question

I am trying to learn assembly through the guide from azeria-labs.com

I have a question about the use of the LR register and the PC register in the epilogue of non-leaf and leaf functions.

In the snippet below they show the difference for the epilogue in these functions.

If i write a program in C and look at in GDB it will always use "pop {r11, pc} for a non-leaf function and "pop {r11}; bx lr" for a leaf function. Can anybody tell me why this is?

When i am in a leaf function. Does it for example make a difference if i use "bx lr" or "pop pc" to go back to the parent functions?

/* An epilogue of a leaf function */ 
pop    {r11}        
bx     lr           

/* An epilogue of a non-leaf function */
pop    {r11, pc}

`pop` only works if you pushed something before. In leaf functions you don't normally destroy `lr` so it's not normally pushed, hence it can't be popped which is why `bx lr` is used. Non-leaf functions need to save `lr` because calling another function will destroy that, so it is pushed and can be popped. That said, nothing forbids pushing `lr` to the stack in leaf functions too so you can use the same prologue/epilogue. — Jester, Aug 16 '19 at 16:57
leaf or not if the function does not modify a register that per the calling convention must be preserved then it does not need to push that register on the stack. lr being one of those. note that your compiler didnt push r4,r5,r6... either. lr being more sensitive in that whatever calling convention so long as lr is used as the return address (if bl/blx are desired to be used) you need to preserve it if the function modifies it. — old_timer, Aug 16 '19 at 17:10
to conform to the current standard which your code/compiler or at least code shown does not. lr could be the extra register used to keep the stack aligned for a leaf function, and not because it has to. — old_timer, Aug 16 '19 at 17:12
Sounds like you're looking at un-optimized code that always creates a frame pointer, otherwise it wouldn't have to pop anything (unless it saved/restored some call-preserved registers to use as temporaries in a more complex leaf function. Or if it chose to push `lr` itself to use *it* as a temporary). A normal function will *only* use `bx lr`. https://godbolt.org/z/Wrm-ls — Peter Cordes, Aug 17 '19 at 01:07

score 3 · Answer 1 · edited Jun 20 '20 at 09:12

I am trying to learn assembly

I have a question about the use of the LR register and the PC register in the epilogue of non-leaf and leaf functions.

This is part of the beauty and pain of assembler. There are no rules for the use of anything. It is up to you to decide what is needed. Please see: ARM Link and frame pointer as it maybe helpful.

... it will always use pop {r11, pc} for a non-leaf function and pop {r11}; bx lr for a leaf function. Can anybody tell me why this is?

A 'C' compiler is different. It has rules called an ABI. The latest version is called AAPCS for arm or ATPCS for thumb. These rules exist so that different compilers can call each others functions.^note1 Ie, tools can operate. You can have this 'rule' in assembler or you can disregard it. Ie, if your goal is to interoperate with a compilers code, you need to follow that ABI rules.

Some of the rules say what needs to be pushed on the stack and how registers are used. The 'reason' that the leaf is different is that it is more efficient. Writing to a register lr is much faster than memory (push to the stack). When it is an non-leaf function, a function call there will destroy the existing lr and you would not return the right place afterwards, so LR is pushed to the stack to make things work.

When i am in a leaf function. Does it for example make a difference if i use "bx lr" or "pop pc" to go back to the parent functions?

The bx lr is faster than the pop pc because one uses memory and the other does not. Functionally they are the same. However, one common reason to use assembler is to be faster. You will functionally end up with the same execution path, it is just it will take longer; how much will depend on the memory system. It could be next to negligible for a Cortex-M with TCM or very high for Cortex-A CPUS.

The ARM uses register to pass parameters because this is faster than pushing parameters on the stack. Consider this code,

int foo(int a, int b, int c) {return a+b+c;}
int bar(int a) { return foo(a, 1, 2);}

Here is a possible ARM code ^note2,

  foo:
    pop {r0, r1}
    add r0,r0,r1   ; only two registers needed.
    pop {r1}
    add r0,r0,r1
    bx  lr

  bar:
   push lr
   push r0     ; notice we are only using one register?
   mov r0, #1
   push r0
   mov r0, #2
   push r0
   bl foo
   pop pc

This is not how any ARM compiler will do things. The convention is to use R0, R1, and R2 to pass the parameters. Because this is faster and actually produces less code. But either way achieves the same thing. Maybe,

  foo:
   add r0,r0,r1  ; a = a + b
   add r0,r0,r2  ; a = a + c
   bx  lr

  bar:
   push lr     ; a = a from caller of bar.
   mov r1, #1  ; b = 1
   mov r2, #2  ; c = 2
   bl foo
   pop pc

The lr is somewhat similar to the parameters. You could push the parameters on the stack or just leave them in a register. You could put the lr on the stack and then pop it off later or you can just leave it there. What should not be under-estimated is how much faster code can become when it uses registers as oppose to memory. Moving things around is generally a sign that assembler code is not optimal. The more mov, push and pop you have the slower your code is.

So generally quite a bit of thought went into the ABI to make it as fast as possible. The older APCS is slightly slower than the newer AAPCS, but they both work.

Note1: You will notice a difference between static and non static function if you turn up optimizations. This is because the compiler may ignore the ABI to be faster. Static functions can NOT be called by another compiler and don't need to interoperate.

Note2: In fact the CPU designers think a lot about the ABI and take into consideration how many registers. Too many registers and the opcodes will be big. Too few and there will be lots of memory used instead of registers.

GCC likes to keep the stack aligned; if it doesn't have to care about Thumb interworking, it will sometime make code like `push {r4, lr}` and `pop {r4, pc}` that save/restore `r4` in non-leaf functions even if it doesn't use it, just to avoid spending more instructions fixing up `sp` to maintain stack alignment. — Peter Cordes, Aug 17 '19 at 01:11
I was making 2 points: understanding *why* a C compiler does that is helpful if you're trying to learn from reading its output. And 2nd: if you're going to push anything, you can save code-size by also pushing/popping `lr`/`pc` instead of a separate `bx lr` (unless you need interworking), and it frees up `lr` for scratch space within the leaf function. Mostly just pointing it out as a useful trick / idiom for compact code. — Peter Cordes, Aug 17 '19 at 02:33

score 0 · Answer 2 · answered Aug 16 '19 at 16:59

In the leaf function, there are no other function calls which would modify the link register lr.

For a non-leaf function, the lr must be preserved, done here by pushing it to the stack (somewhere not shown, earlier in the function).

The epilogue of the non-leaf function could be rewritten:

pop    {r11, lr}
bx     lr

This is however one more instruction, and so it is slightly less efficient.

Use of LR and PC instructions in non-leaf and leaf functions epilogue

2 Answers2