No, library functions aren't special, except for a few like memcpy
that can get inlined. In the resulting asm, if there's a call
instruction, the asm function is non-leaf. If not, it's a leaf function (even if it ends with an optimized tailcall using jmp
to another function).
Note that due to inlining, (including of some library functions the compiler knows about, e.g. simple math, and string/memcpy for small constant sizes), and optimizing out calls to "pure" functions whose results are unused, a C function that makes calls may still optimize to an asm function that's a leaf. Also recursion can sometimes be optimized into iteration.
In the other direction, a compiler may optimize a loop into a call to memset or memcpy, for example a loop like for (size_t i=0 ; i<n ; i++) arr[i] = 0;
. This can make a non-leaf asm function even though the C source doesn't have any calls.
Leaf vs. non-leaf matters in assembly because you have to re-align the stack before another call, and stuff in call-clobbered registers like ECX gets clobbered by any function call. And you mentioned the red-zone: a function that does (or might on some paths of execution) can't keep anything in the red zone below RSP if it needs to live across the function call, or be written by the function call as an output.
Library functions follow the same calling convention as ones the compiler is generating now, so calling a library function doesn't relax any of those requirements. (On Windows there are multiple 32-bit calling conventions, but they all have the same set of call-clobbered registers. Unless you're using the Irvine32 library of toy functions for hand-written asm, where all registers are call-preserved except a return value if there is one.)
Just the opposite, in fact: calling a function defined in the same source file can let the compiler inline it if it chooses, making the caller a leaf.
#include <stdlib.h>
#include <string.h>
int bar(int,int);
int leaf(int *p, int a){
*p = 0;
int c = (a<10);
memcpy(&a, &c, sizeof(a)); // defined by GCC as __builtin_memcpy, optimizes away to a=c
*p = c;
return bar(a, a); // tailcall
}
This compiles fairly simply with GCC11.2 -O3 for x86-64
leaf:
mov r8, rdi # silly compiler, could have avoided this by materializing the boolean into ESI, leaving EDI untouched until after the store
xor edi, edi # zero a register to setcc into to get a zero-extended 0/1
cmp esi, 9
setle dil # EDI = (a<=9) // (a<10)
mov DWORD PTR [r8], edi # store to the pointer. The store of 0 earlier is optimized out as a dead store
mov esi, edi # copy the arg for bar(a,a)
jmp bar
Note that the *p = 0;
store got optimized away (dead store elimination) because we store something else to the same place. And unlike the below function, we don't call any code that might read some global variable (which p
might be pointing to). Most library functions aren't special-cased by the compiler as not touching any global state, although many math library functions are. So the compiler has to have all memory (except for non-escaped local vars) in sync with the C abstract machine when function calls are made to non-inline functions it doesn't know anything about. ("opaque" to the optimizer). That includes all functions that aren't defined in this compilation unit, unless you use link-time optimization to allow cross-file inter-procedural analysis/optimization and inlining.
The non-leaf function doesn't look much different in C, but I picked a C library function that the compiler doesn't inline. I even left out the a<10
part, but it's still much more asm.
int non_leaf(int *p, int a){
*p = 0;
int c = rand();
*p = c;
return bar(a, a);
}
non_leaf:
push rbp # save a call-preserved reg
mov ebp, esi # use it to save a for use after rand
push rbx
mov rbx, rdi # and another for the pointer
sub rsp, 8 # realign the stack by 16
# end of function prologue
mov DWORD PTR [rdi], 0 # *p = 0; not optimized away because GCC doesn't know that p won't be pointing into memory that rand() reads or writes
call rand # int c = rand()
mov esi, ebp
mov edi, ebp # set up both args for the tailcall to bar
mov DWORD PTR [rbx], eax # store *p = c
# start of function epilogue
add rsp, 8 # epilogue: restore stack stuff back to function-entry state
pop rbx
pop rbp
jmp bar # tailcall with edi=esi = incoming ESI
Note that this is the only version that has to do anything with the stack, or any other kind of prologue/epilogue. In a leaf function, you'd only push/pop call-preserved registers if you ran out of call-clobbered registers for a complicated function.
Calling a function you defined in another file would look exactly the same. (Unless you used -flto
, or -fwhole-program
with both files on the same gcc command line)
But calling a function defined in this file so it can inline is different:
int helper(int a){
return a+1;
}
int call_inline(int *p, int a){
*p = 0;
int c = helper(a);
*p = c;
return c;
}
# Not declared static, so GCC emits a stand-alone definition in case another file wants to call it.
helper:
lea eax, [rdi+1]
ret
# But the call from this function fully inlined:
call_inline:
lea eax, [rsi+1] # c(eax) = helper(a) = a+1
mov DWORD PTR [rdi], eax # *p = c;
ret # return c (still in EAX)
I also removed the tailcall to bar()
, but that would only be one extra instruction.