GCC calling convention for x86_64 Linux systems

Question

I have written a minimal function to test whether I can call/link C and x86_64 assembly code.

Here is my main.c

#include <stdio.h>

extern int test(int);

int main(int argc, char* argv[])
{

    int a = 10;
    
    int b = test(a);

    printf("b=%d\n", b);

    return 0;
}

Here is my test.asm

section .text
    global test

test:
    mov ebx,2
    add eax,ebx
    ret

I built an executable using this script

#!/usr/bin/env bash

nasm -f elf64 test.asm -o test.o

gcc -c main.c -o main.o

gcc main.o test.o -o a.out

I wrote test.asm without having any real clue what I was doing. I then went away and did some reading, and now I don't understand how my code appears to be working, as I have convinced myself that it shouldn't be.

Here's a list of reasons why I think this shouldn't work:

I don't save or restore the base pointer (setup the stack frame). I actually don't understand why this is needed, but every example I have looked at does this.
The calling convention for the gcc compiler on Linux systems should be to pass arguments via the stack. Here I assume the arguments are passed using eax and ebx. I don't think that is right.
ret probably expects to pick up a return address from somewhere. I am fairly sure I haven't supplied this.
There may even be other reasons which I don't know about.

Is it a complete fluke that what I have written produces the correct output?

I am completely new to this. While I have heard of some x86 concepts in passing this is the first time I have actually attempted to write some. Got to start somewhere?

Edit: For future reference here is a corrected code

test:
                    ; save old base pointer
    push rbp        ; sub rsp, 8; mov [rsp] rbp
    mov rbp, rsp    ; mov rbp, rsp ;; rbp = rsp
                    ; initializes new stack frame

    add rdi, 2      ; add 2 to the first argument passed to this function
    mov rax, rdi    ; return value passed via rax

                    ; did not allocate any local variables, nothing to add to
                    ; stack pointer
                    ; the stack pointer is unchanged

    pop rbp         ; restore old base pointer

    ret             ; pop the return address off the stack and jump
                    ; call and ret modify or save the rip instruction pointer

A good place to start is [here](https://en.wikipedia.org/wiki/X86_calling_conventions) or [here](https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-1.0.pdf) — Chris Dodd, May 16 '22 at 20:16
There's a summary of calling conventions at https://wiki.osdev.org/System_V_ABI. Trying to guess at them is a waste of time; it's something you're meant to learn by looking up (and maybe eventually memorizing). Your note about arguments being passed on the stack is probably from x86-32; for x86-64, the first six parameters are passed in registers. So look for your single `int` argument in `edi`. And as noted, `rbx` is one of the call-preserved registers so if you wish to use it (which is not necessary in this function), you must save and restore it. — Nate Eldredge, May 16 '22 at 20:20
Many of your questions can be answered by looking at what the C compiler does, since it obeys the same calling conventions. Here you go: https://godbolt.org/z/h4x9r4fE6 — David Grayson, May 16 '22 at 20:25
As David Grayson said, compilers can make working examples for you. See [How to remove "noise" from GCC/clang assembly output?](https://stackoverflow.com/q/38552116) for more, especially the video of Matt Godbolt's cppcon talk. — Peter Cordes, May 16 '22 at 22:02
Yes, you could write it that way, but why bother messing around with RBP? Waste of instructions here. Also, you declared it as taking/returning `int`, so there's no point spending extra code-size on REX prefixes to work with 64-bit registers instead of `add edi, 2` / `mov eax, edi` / `ret`. The caller will only look at the low 32 bits of the return value. Of course an optimizing compiler would take advantage of the special instruction LEA that allows x86 to copy-and-add, `lea eax, [rdi+2]` / `ret`. — Peter Cordes, May 19 '22 at 20:51
@PeterCordes I primarily put that there for learning purposes. If the purpose was to write something more complicated using stack allocated local variables this would be a good starting point. — FreelanceConsultant, May 19 '22 at 20:56
If you want to address your stack locals relative to RBP instead of RSP, then sure. RSP usually doesn't need to move after function entry in most functions, so there's no need for another "stable" point of reference. RBP frame pointers are mostly useful for simple debugging without DWARF stack unwind info, with nested calls. — Peter Cordes, May 19 '22 at 21:19
@PeterCordes If you wanted to call another function from this function, you would have to move the stack pointer. The point being it is a general approach, at least as far as I am presently aware. — FreelanceConsultant, May 19 '22 at 21:27
Yeah, but you still just do one thing in the prologue (`sub rsp, 8 + n*16`) before accessing any locals, and then reverse it in the epilogue, after the last access of a local. *During* the execution of the function, everything stays at fixed offsets from RSP, exactly like things would stay at fixed offsets from RBP. Because unlike 32-bit code, you aren't pushing function args (unless they take more than 6 integer args or whatever). See [this answer](https://stackoverflow.com/a/41914096) for example. Especially [x86\_64 : is stack frame pointer almost useless?](//stackoverflow.com/q/31417784) — Peter Cordes, May 19 '22 at 22:17
Just seems pointless and arbitrary to me to push/pop RBP in a function, like you're doing it as boiler-plate without understanding why it's there. You could indeed `call foo` after doing that without violating the ABI, but you'd have zero space for any locals; the `sub rsp, something` is the important part if you care about having space for local vars in a non-leaf function, not the RBP stuff. — Peter Cordes, May 19 '22 at 22:20
@PeterCordes I really don't get the point you're trying to make. If locals were needed one would just sub from the stack pointer and mov or possibly push some values. I could have extended the example to include more - eg add some local variables. But what would the point be if they were unused - I would have to write another function or write something more complicated which uses them. **Yes it is trivially obvious that an optimization here would be to not do any of the operations which are then un-done at the end of the function - but I'm sure most people would realize that?** — FreelanceConsultant, May 20 '22 at 08:27
My point is that setting up RBP as a frame pointer is not something you need to do even if you do want to use local vars. You were arguing that leaving it in as useful boiler-plate(?) makes sense; I'm arguing that it's not useful even as part of a more complicated function. (Until things get really complicated, like wanting to align RSP by 32 or more, or allocating runtime-variable amounts of stack space, like `alloca`.) Especially as a "correction" to your first attempt, it implies that leaving it out was incorrect. But it's not in any way, regardless of using any stack space. — Peter Cordes, May 20 '22 at 11:44

score 5 · Answer 1 · answered May 16 '22 at 20:17

I don't save or restore the base pointer (setup the stack frame). I actually don't understand why this is needed, but every example I have looked at does this.

That's not needed. Try compiling some C code with -O3 and you'll see it doesn't happen then.

The calling convention for the gcc compiler on Linux systems should be to pass arguments via the stack. Here I assume the arguments are passed using eax and ebx. I don't think that is right.

That part is only working because of a fluke. The assembly happens to put 10 in eax too the way you compiled, but there's no guarantee this will always happen. Again, compile with -O3 and it won't anymore.

ret probably expects to pick up a return address from somewhere. I am fairly sure I haven't supplied this.

This part is fine. The return address gets supplied by the caller. It'll always be at the top of the stack when your function gets entered.

There may even be other reasons which I don't know about.

Yes, there's one more: ebx is call-saved but you're clobbering it. If the calling function (or anything above it in the stack) used it, then this would break it.

Is it a complete fluke that what I have written produces the correct output?

Yes, because of the second and fourth points above.

For reference, here's a comparison of the assembly that gcc produces from your C code at the -O0 (the default optimization level) and -O3: https://godbolt.org/z/7P13fbb1a

This is going to open a whole can of worms, but I see from the godbolt dissassembly that before calling `test`, the compiler puts a value into `eax` and them immediatly copies it to `edi`. Why? PS: May not reply further this evening. It's late. — FreelanceConsultant, May 16 '22 at 20:31
@FreelanceConsultant At `-O0`, compilers do all kinds of redundant stuff like that. See [Why does gcc create redundant assembly code?](https://stackoverflow.com/q/10663004/7509065) and [Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?](https://stackoverflow.com/q/53366394/7509065) — Joseph Sible-Reinstate Monica, May 16 '22 at 20:36

GCC calling convention for x86_64 Linux systems

Edit: For future reference here is a corrected code

1 Answers1