Does calling library functions still make them non-leaf? how are library functions handled by x86 assembly?

Question

So, when we have a C program (or some other language) that has a function (funcA) that calls another function (funcB) within the same program, funcA is considered to be non-leaf because it calls other functions. Therefore, a stackframe and everything is set up instead of making use of redzone.

However, say in funcB, we don't call any functions that we explicitly write in the program itself, but we do call a library function or two, say fscanf(), fopen() (but I don't believe it would matter, as long as it's a library function). Would funcB then not be a leaf function because it's still calling another function? How are library functions handled in x86??

Analyzing some x86 it's clear that there are no obvious jumps happening, but I can see it executing like, call __isoc99_fscanf@PLT # and call perror@PLT #.

It depends on your tutor’s definition of “leaf function”. I'd call `funcB()` a non-leaf function, but I'm not your tutor. I don't think the concept is particularly useful. — Jonathan Leffler, Apr 12 '22 at 02:54

score 6 · Accepted Answer · answered Apr 12 '22 at 04:19

No, library functions aren't special, except for a few like memcpy that can get inlined. In the resulting asm, if there's a call instruction, the asm function is non-leaf. If not, it's a leaf function (even if it ends with an optimized tailcall using jmp to another function).

Note that due to inlining, (including of some library functions the compiler knows about, e.g. simple math, and string/memcpy for small constant sizes), and optimizing out calls to "pure" functions whose results are unused, a C function that makes calls may still optimize to an asm function that's a leaf. Also recursion can sometimes be optimized into iteration.

In the other direction, a compiler may optimize a loop into a call to memset or memcpy, for example a loop like for (size_t i=0 ; i<n ; i++) arr[i] = 0;. This can make a non-leaf asm function even though the C source doesn't have any calls.

Leaf vs. non-leaf matters in assembly because you have to re-align the stack before another call, and stuff in call-clobbered registers like ECX gets clobbered by any function call. And you mentioned the red-zone: a function that does (or might on some paths of execution) can't keep anything in the red zone below RSP if it needs to live across the function call, or be written by the function call as an output.

Library functions follow the same calling convention as ones the compiler is generating now, so calling a library function doesn't relax any of those requirements. (On Windows there are multiple 32-bit calling conventions, but they all have the same set of call-clobbered registers. Unless you're using the Irvine32 library of toy functions for hand-written asm, where all registers are call-preserved except a return value if there is one.)

Just the opposite, in fact: calling a function defined in the same source file can let the compiler inline it if it chooses, making the caller a leaf.

Examples (on Godbolt)

#include <stdlib.h>
#include <string.h>

int bar(int,int);

int leaf(int *p, int a){
    *p = 0;
    int c = (a<10);
    memcpy(&a, &c, sizeof(a));   // defined by GCC as __builtin_memcpy, optimizes away to a=c
    *p = c;
    return bar(a, a);           // tailcall
}

This compiles fairly simply with GCC11.2 -O3 for x86-64

leaf:
        mov     r8, rdi            # silly compiler, could have avoided this by materializing the boolean into ESI, leaving EDI untouched until after the store
        xor     edi, edi           # zero a register to setcc into to get a zero-extended 0/1
        cmp     esi, 9
        setle   dil                # EDI = (a<=9) // (a<10)
        mov     DWORD PTR [r8], edi  # store to the pointer.  The store of 0 earlier is optimized out as a dead store
        mov     esi, edi           # copy the arg for bar(a,a)
        jmp     bar

Note that the *p = 0; store got optimized away (dead store elimination) because we store something else to the same place. And unlike the below function, we don't call any code that might read some global variable (which p might be pointing to). Most library functions aren't special-cased by the compiler as not touching any global state, although many math library functions are. So the compiler has to have all memory (except for non-escaped local vars) in sync with the C abstract machine when function calls are made to non-inline functions it doesn't know anything about. ("opaque" to the optimizer). That includes all functions that aren't defined in this compilation unit, unless you use link-time optimization to allow cross-file inter-procedural analysis/optimization and inlining.

The non-leaf function doesn't look much different in C, but I picked a C library function that the compiler doesn't inline. I even left out the a<10 part, but it's still much more asm.

int non_leaf(int *p, int a){
    *p = 0;
    int c = rand();
    *p = c;
    return bar(a, a);
}

non_leaf:
        push    rbp                  # save a call-preserved reg
        mov     ebp, esi             # use it to save a for use after rand
        push    rbx
        mov     rbx, rdi             # and another for the pointer
        sub     rsp, 8               # realign the stack by 16
   # end of function prologue
        mov     DWORD PTR [rdi], 0   # *p = 0;   not optimized away because GCC doesn't know that p won't be pointing into memory that rand() reads or writes
        call    rand                 # int c = rand()
        mov     esi, ebp
        mov     edi, ebp             # set up both args for the tailcall to bar
        mov     DWORD PTR [rbx], eax # store *p = c
   # start of function epilogue
        add     rsp, 8               # epilogue: restore stack stuff back to function-entry state
        pop     rbx
        pop     rbp
        jmp     bar                  # tailcall with edi=esi = incoming ESI

Note that this is the only version that has to do anything with the stack, or any other kind of prologue/epilogue. In a leaf function, you'd only push/pop call-preserved registers if you ran out of call-clobbered registers for a complicated function.

Calling a function you defined in another file would look exactly the same. (Unless you used -flto, or -fwhole-program with both files on the same gcc command line)

But calling a function defined in this file so it can inline is different:

int helper(int a){
    return a+1;
}
int call_inline(int *p, int a){
    *p = 0;
    int c = helper(a);
    *p = c;
    return c;
}

# Not declared static, so GCC emits a stand-alone definition in case another file wants to call it.
helper:
        lea     eax, [rdi+1]
        ret

# But the call from this function fully inlined:
call_inline:
        lea     eax, [rsi+1]              # c(eax) = helper(a) = a+1
        mov     DWORD PTR [rdi], eax      # *p = c;
        ret                               # return c (still in EAX)

I also removed the tailcall to bar(), but that would only be one extra instruction.

Does calling library functions still make them non-leaf? how are library functions handled by x86 assembly?

1 Answers1

Examples (on Godbolt)

Linked