How do you call C functions from Assembly and how do you link it Statically?

Question

I am playing around and trying to understand the low-level operation of computers and programs. To that end, I am experimenting with linking Assembly and C.

I have 2 program files:

Some C code here in "callee.c":

#include <unistd.h>

void my_c_func() {
  write(1, "Hello, World!\n", 14);
  return;
}

I also have some GAS x86_64 Assembly here in "caller.asm":

.text

.globl my_entry_pt

my_entry_pt:
  # call my c function
  call my_c_func # this function has no parameters and no return data

  # make the 'exit' system call
  mov $60, %rax # set the syscall to the index of 'exit' (60)
  mov $0, %rdi # set the single parameter, the exit code to 0 for normal exit
  syscall

I can build and execute the program like this:

$ as ./caller.asm -o ./caller.obj
$ gcc -c ./callee.c -o ./callee.obj
$ ld -e my_entry_pt -lc ./callee.obj ./caller.obj -o ./prog.out -dynamic-linker /lib64/ld-linux-x86-64.so.2
$ ldd ./prog.out
    linux-vdso.so.1 (0x00007fffdb8fe000)
    libc.so.6 => /lib64/libc.so.6 (0x00007f46c7756000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f46c7942000)
$ ./prog.out
Hello, World!

Along the way, I had some problems. If I don't set the -dynamic-linker option, it defaults to this:

$ ld -e my_entry_pt -lc ./callee.obj ./caller.obj -o ./prog.out
$ ldd ./prog.out
    linux-vdso.so.1 (0x00007ffc771c5000)
    libc.so.6 => /lib64/libc.so.6 (0x00007f8f2abe2000)
    /lib/ld64.so.1 => /lib64/ld-linux-x86-64.so.2 (0x00007f8f2adce000)
$ ./prog.out
bash: ./prog.out: No such file or directory

Why is this? Is there a problem with the linker defaults on my system? How can/should I fix it?

Also, static linking doesn't work.

$ ld -static -e my_entry_pt -lc ./callee.obj ./caller.obj -o ./prog.out
ld: ./callee.obj: in function `my_c_func':
callee.c:(.text+0x16): undefined reference to `write'

Why is this? Shouldn't write() just be a c library wrapper for the syscall 'write'? How can I fix it?

Where can I find the documentation on the C function calling convention so I can read up on how parameters are passed back and forth, etc...?

Lastly, while this seems to work for this simple example, am I doing something wrong in my initialization of the C stack? I mean, right now, I'm doing nothing. Should I be allocing memory from the kernel for the stack, setting bounds, and setting %rsp and %rbp before I start trying to call functions. Or is the kernel loader taking care of all this for me? If so, will all architectures under a Linux kernel take care of it for me?

@JosephSible-ReinstateMonica gcc by default attempts to link in crt1.o which includes the c library provided "_start". This includes a call to "main" which causes a link error since "main" doesn't exist. I was able to solve that issue by including an empty and unused main function in "callee.c", but it seems very hacky to have to do. This also means that there is a bunch of unused code getting linked into my program. — Echelon X-Ray, Jun 14 '20 at 07:42
@JosephSible-ReinstateMonica If I link that statically, I get a segfault at runtime. Using GDB, I traced the problem to the first instruction in the function write() from glibc. The instruction is "mov %fs:0x18,%eax". The value of %fs at the point of failure is 0. I don't know what the purpose of that instruction is, but I suspect that I am violating some spec of the ABI and need to do some Stack initialization. — Echelon X-Ray, Jun 14 '20 at 07:48
Why do you want to use C functions from an assembler main program? Commonly it's the other way around. -- The functions of the C standard library might need some initialization. I'd be afraid to miss essential stuff without crt1. -- You can use the option `-nostartfiles` to prevent linking of startup code and the call of `main()`. — the busybee, Jun 14 '20 at 08:53
@EchelonX-Ray: use `gcc -nostartfiles` to link without CRT, but still *with* libc and the correct path for the ELF interpreter. This will only work in a dynamically-linked executable, because glibc needs to initialize itself somehow, either by dynamic-linker hooks or by your `_start` calling its init functions in the right order, which your entry point doesn't do. — Peter Cordes, Jun 14 '20 at 11:30

score 4 · Accepted Answer · edited Jun 15 '20 at 14:14

While the Linux kernel provides a syscall named write, it does not mean that you automatically get a wrapper function of the same name you can call from C as write(). In fact, you need inline assembly to call any syscalls from C, if you're not using libc, because libc defines those wrapper functions.

Instead of explicitly linking your binaries with ld, let gcc do it for you. It can even assemble assembly files (internally executing a suitable version of as), if the source ends with a .s suffix. It looks like your linking problems are simply a disagreement between what GCC assumes and how you do it via LD yourself.

No, it's not a bug; the ld default path for ld.so isn't the one used on modern x86-64 GNU/Linux systems. (/lib/ld64.so.1 might have been used on early x86-64 GNU/Linux ports before the dust settled on where multi-arch systems would put everything to support both i386 and x86-64 versions of libraries installed at the same time. Modern systems use /lib64/ld-linux-x86-64.so.2)

Linux uses the System V ABI. The AMD64 Architecture Processor Supplement (PDF) describes the initial execution environment (when _start gets invoked), and the calling convention. Essentially, you have an initialized stack, with environment and command-line arguments stored in it.

Let's construct a fully working example, containing both C and assembly (AT&T syntax) sources, and a final static and dynamic binaries.

First, we need a Makefile to save typing long commands:

# SPDX-License-Identifier: CC0-1.0

CC      := gcc
CFLAGS  := -Wall -Wextra -O2 -march=x86-64 -mtune=generic -m64 \
           -ffreestanding -nostdlib -nostartfiles
LDFLAGS :=

all: static-prog dynamic-prog

clean:
    rm -f static-prog dynamic-prog *.o

%.o: %.c
    $(CC) $(CFLAGS) $^ -c -o $@

%.o: %.s
    $(CC) $(CFLAGS) $^ -c -o $@

dynamic-prog: main.o asm.o
    $(CC) $(CFLAGS) $^ $(LDFLAGS) -o $@

static-prog: main.o asm.o
    $(CC) -static $(CFLAGS) $^ $(LDFLAGS) -o $@

Makefiles are particular about their indentation, but SO converts tabs to spaces. So, after pasting the above, run sed -e 's|^ *|\t|' -i Makefile to fix the indentation back to tabs.

The SPDX License Identifier in the above Makefile and all following files tell you that these files are licensed under Creative Commons Zero license: that is, these are all dedicated to public domain.

Compilation flags used:

-Wall -Wextra: Enable all warnings. It is a good practice.
-O2: Optimize the code. This is a commonly used optimization level, usually considered sufficient and not too extreme.
-march=x86-64 -mtune=generic -m64: Compile to 64-bit x86-64 AKA AMD64 architecture. These are the defaults; you can use -march=native to optimize for your own system.
-ffreestanding: Compilation targets the freestanding C environment. Tells the compiler it can't assume that strlen or memcpy or other library functions are available, so don't optimize a loop, struct copy, or array initialization into calls to strlen, memcpy, or memset, for example. If you do provide asm implementations of any functions gcc might want to invent calls to, you can leave this out. (Especially if you're writing a program that will run under an OS)
-nostdlib -nostartfiles: Do not link in the standard C library or its startup files. (Actually, -nostdlib already "includes" -nostartfiles, so -nostdlib alone would suffice.)

Next, let's create a header file, nolib.h, that implements nolib_exit() and nolib_write() wrappers around the group_exit and write syscalls:

// SPDX-License-Identifier: CC0-1.0

/* Require Linux on x86-64 */
#if !defined(__linux__) || !defined(__x86_64__)
#error "This only works on Linux on x86-64."
#endif

/* Known syscall numbers, without depending on glibc or kernel headers */
#define SYS_write         1
#define SYS_exit_group  231
 // Normally you'd use
 // #include <asm/unistd.h> for __NR_write and __NR_exit_group
 // or even  #include <sys/syscall.h>   for SYS_write



/* Inline assembly macro for a single-parameter no-return syscall */
#define SYSCALL1_NORET(nr, arg1) \
    __asm__ volatile ( "syscall\n\t" : : "a" (nr), "D" (arg1) : "rcx", "r11", "memory")

/* Inline assembly macro for a three-parameter syscall */
#define SYSCALL3(retval, nr, arg1, arg2, arg3) \
    __asm__ volatile ( "syscall\n\t" : "=a" (retval) : "a" (nr), "D" (arg1), "S" (arg2), "d" (arg3) : "rcx", "r11", "memory" )

/* exit() function */
static inline void nolib_exit(int retval)
{
    SYSCALL1_NORET(SYS_exit_group, retval);
}

/* Some errno values */
#define  EINTR    4     /* Interrupted system call */
#define  EBADF    9     /* Bad file descriptor */
#define  EINVAL  22     /* Invalid argument */
 // or   #include <asm/errno.h>  to define these

/* write() syscall wrapper - returns negative errno if an error occurs */
static inline long nolib_write(int fd, const void *data, long len)
{
    long  retval;

    if (fd == -1)
        return -EBADF;
    if (!data || len < 0)
        return -EINVAL;

    SYSCALL3(retval, SYS_write, fd, data, len);

    return retval;
}

The reason the nolib_exit() uses the exit_group syscall instead of the exit syscall is that exit_group ends the entire process. If you run a program under strace, you'll see it too calls exit_group syscall at the very end. (Syscall implementation of exit())

Next, we need some C code. main.c:

// SPDX-License-Identifier: CC0-1.0

#include "nolib.h"

const char *c_function(void)
{
    return "C function";
}

static inline long nolib_put(const char *msg)
{
    if (!msg) {
        return nolib_write(1, "(null)", 6);
    } else {
        const char *end = msg;
        while (*end)
            end++;           // strlen
        if (end > msg)
            return nolib_write(1, msg, (unsigned long)(end - msg));
        else
            return 0;
    }
}

extern const char *asm_function(int);

void _start(void)
{
    nolib_put("asm_function(0) returns '");
    nolib_put(asm_function(0));
    nolib_put("', and asm_function(1) returns '");
    nolib_put(asm_function(1));
    nolib_put("'.\n");

    nolib_exit(0);
}

nolib_put() is just a wrapper around nolib_write(), that finds the end of the string to be written, and calculates the number of characters to be written based on that. If the parameter is a NULL pointer, it prints (null).

Because this is a freestanding environment, and the default name for the entry point is _start, this defines _start as a C function that never returns. (It must not ever return, because the ABI does not provide any return address; it would just crash the process. Instead, an exit-type syscall must be called at end.)

The C source declares and calls a function asm_function, that takes an integer parameter, and returns a pointer to a string. Obviously, we'll implement this in assembly.

The C source also declares a function c_function, that we can call from assembly.

Here's the assembly part, asm.s:

# SPDX-License-Identifier: CC0-1.0

    .text
    .section    .rodata
.one:
    .string     "One"       # includes zero terminator

    .text
    .p2align    4,,15
    .globl      asm_function       #### visible to the linker

    .type       asm_function, @function
asm_function:
    cmpl    $1, %edi
    jne     .else
    leaq    .one(%rip), %rax
    ret

.else:
    subq    $8, %rsp              # 16B stack alignment for a call to C
    call    c_function
    addq    $8, %rsp
    ret

    .size   asm_function, .-asm_function

We don't need to declare c_function as an extern because GNU as treats all unknown symbols as external symbols anyway. We could add Call Frame Information directives, at least .cfi_startproc and .cfi_endproc, but I left them out so it wouldn't be so obvious I just wrote the original code in C and let GCC compile it to assembly, and then prettified it just a bit. (Did I write that out aloud? Oops! But seriously, compiler output is often a good starting point for a hand-written asm implementation of something, unless it does a very bad job of optimizing.)

The subq $8, %rsp adjusts the stack so that it will be a multiple of 16 for the c_function. (On x86-64, stacks grow down, so to reserve 8 bytes of stack, you subtract 8 from the stack pointer.) After the call returns, addq $8, %rsp reverts the stack back to original.

With these four files, we're ready. To build the example binaries, run e.g.

reset ; make clean all

Running either ./static-prog or ./dynamic-prog will output

asm_function(0) returns 'C function', and asm_function(1) returns 'One'.

The two binaries are just 2 kB (static) and 6 kB (dynamic) in size or so, although you can make them even smaller by stripping unneeded stuff,

strip --strip-unneeded static-prog dynamic-prog

which removes about 0.5 kB to 1 kB of unneeded stuff from them – the exact amount varies depending on the version of GCC and Binutils you use.

On some other architectures, we'd need to also link against libgcc (via -lgcc), because some C features rely on internal GCC functions. 64-bit integer division (named udivdi or similar) on various architectures is a typical example.

As mentioned in the comments, the first version of the above examples had a few issues that need to be addressed. They do not stop the example from executing or working as intended, and were overlooked because the examples were written from scratch for this answer (in the hopes that others finding this question later on via web searches might find this useful), and I'm not perfect. :)

memory clobber argument to the inline assembly, in the syscall preprocessor macros

Adding "memory" in the clobbered list tells the compiler that the inline assembly may access (read and/or write) memory other than those specified in the parameter lists. It is obviously needed for the write syscall, but it is actually important for all syscalls, because the kernel can deliver e.g. signals in the same thread before returning from the syscall, and signal delivery can/will access memory.

As the GCC documentation mentions, this clobber also behaves like a read/write memory barrier for the compiler (but NOT for the processor!). In other words, with the memory clobber, the compiler knows that it must write any changes in variables etc. in memory before the inline assembly, and that unrelated variables and other memory content (not explicitly listed in the inline assembly inputs, outputs, or clobbers) may also change, and will generate the code we actually want, without making incorrect assumptions.
-fPIC -pie: Omitted for simplicity

Position independent code is usually only relevant for shared libraries. In real projects' Makefiles, you will need to use a different set of compilation flags for objects that will be compiled as a dynamic library, static library, dynamically linked executable, or a static executable, as the desired properties (and therefore compiler/linker flags) vary.

In an example such as this one, it is better to try and avoid such extraneous things, as it is a reasonable question to ask on its own ("Which compiler options to use to achieve X, when needing Y ?"), and the answers depend on the required features and context.

In most modern distros, PIE is the default and you might want -fno-pie -no-pie to simplify debugging / disassembling. 32-bit absolute addresses no longer allowed in x86-64 Linux?
-nostdlib does imply (or "include") -nostartfiles

There are quite a few overall options and link options we can use to control how the code is compiled and linked.

Many of the options GCC supports are grouped. For example, -O2 is actually shorthand for a collection of optimization features that you can explicitly specify.

Here, the reason for keeping both is to remind human programmers of the expectations for the code: no standard library, and no start files/objects.
-march=x86-64 -mtune=generic -m64 is the default on x86-64

Again, this is kept more as a reminder of what the code expects. Without a specific architecture definition, one might get the wrong impression that the code should be compilable in general, because C typically is not architecture specific!

The nolib.h header file does contain preprocessor checks (using pre-defined compiler macros to detect the operating system and hardware architecture), halting the compilation with an error for other OSes and hardware architectures.
Most Linux distributions provide the syscall numbers in <asm/unistd.h>, as __NR_name.

These are derived from the actual kernel sources. However, for any given architecture, these are the stable userspace ABI, and will not change. New ones may be added. Only in some extraordinary circumstances (unfixable security holes, perhaps?) can a syscall be deprecated and stop functioning.

It is always better to use the syscall numbers from the kernel, preferably via the aforementioned header, but it's possible to build this program with only GCC, no glibc or Linux kernel headers installed. For someone writing their own standard C library, they should include the file (from Linux kernel sources).

I do know that Debian derivatives (Ubuntu, Mint, et cetera) all do provide the <asm/unistd.h> file, but there are many, many other Linux distributions, and I just am not sure about all of them. I opted to only define the two (exit_group and write), to minimize the risk of problems.

(Editor's note: the file might be in a different place in the filesystem, but the <asm/unistd.h> include path should always work if the right header package is installed. It's part of the kernel's user-space C/asm API.)
Compilation flag -g adds debug symbols, which adds greatly when debugging – for example, when running and examining the binary in gdb.

I omitted this and all related flags, because I did not want to expand the topic any further, and because this example is easily debugged at the asm level and examined even without. See GDB asm tips like layout reg at the bottom of the x86 tag wiki
The System V ABI requires that before a call to a function, the stack is aligned to 16 bytes. So at the top of the function, RSP+-8 is 16-byte aligned, and if there are any stack args, they'll be aligned.

The call instruction pushes the current instruction pointer to the stack, and because this is a 64-bit architecture, that too is 64 bits = 8 bytes. So, to conform to the ABI, we really need to adjust the stack pointer by 8 before calling the function, to ensure it too gets a properly aligned stack pointer. These were initially omitted, but are now included in the assembly (asm.s file).

This matters, because on x86-64, SSE/AVX SIMD vectors have different instructions for aligned-to-16-bytes and unaligned accesses, with the aligned accesses being significantly faster or certain processors. (Why does System V / AMD64 ABI mandate a 16 byte stack alignment?). Using aligned SIMD instructions like movaps with unaligned addresses will cause the process to crash. (e.g. glibc scanf Segmentation faults when called from a function that doesn't align RSP is a real-life example of what happens when you get this wrong.)

However, when we do such stack manipulations, we really should add CFI (Call Frame Information) directives to ensure debugging and stack unwinding etc. works correctly. In this case, for general CFI, we prepend .cfi_startproc before the first instruction in an assembly function, and .cfi_endproc after the last instruction in an assembly function. For the Canonical Frame Address, CFA, we add .cfi_def_cfa_offset N after any instruction that modifies the stack pointer. Essentially, N is 8 at the beginning of the function, and increases as much as %rsp is decremented, and vice versa. See this article for more.

Internally, these directives produce information (metadata) stored in the .eh_frame and .eh_frame_hdr sections in the ELF object files and binaries, depending on other compilation flags.

So, in this case, the subq $8, %rsp should be followed by .cfi_def_cfa_offset 16, and the addq $8, %rsp by .cfi_def_cfa_offset 8, plus .cfi_startproc at the beginning of asm_function and .cfi_endproc after the final ret.

Note that you can often see rep ret instead of just rep in assembly sources. This is nothing but a workaround to certain processors having branch-prediction performance issues when jumping to or falling through a JCC to a ret instruction. The rep prefix does nothing, except it does fix the issues those processors might otherwise have with such a jump. Recent GCC versions stopped doing this by default as the affected AMD CPUs are very old and not as relevant these days. What does `rep ret` mean?
The "key" option, -ffreestanding, is one that chooses a C "dialect"

The C programming language is actually separated into two different environments: hosted, and freestanding.

The hosted environment is one where the standard C library is available, and is used when you write programs, applications, or daemons in C.

The freestanding environment is one where the standard C library is not available. It is used when you write kernels, firmware for microcontrollers or embedded systems, implement (parts of) your own standard C library, or a "standard library" for some other C-derived language.

As an example, the Arduino programming environment is based on a subset of freestanding C++. The standard C++ library is not available, and many features of C++ like exceptions are not supported. In fact, it is very close to freestanding C with classes. The environment also uses a special pre-preprocessor, which for example automatically prepends declarations of functions without the user having to write them.

Probably the most well known example of freestanding C is the Linux kernel. Not only is the standard C library not available, but the kernel code must actually avoid floating-point operations as well, because of certain hardware considerations.

For a better understanding of what exactly does the freestanding C environment look like to a programmer, I think the best thing is to go look at the language standard itself. As of now (June 2020), the most recent standard is ISO C18. While the standard itself is not free, the final draft is; for C18, it is draft N2176(PDF).

`-fPIC` is overkill for a PIE executable. It implies symbol interposition, not just position-independence. Use `-fPIE` if that's what you want. (Most modern distros make `-fPIE -pie` the default, and you have to explicitly use `-fno-pie -no-pie` to build a traditional ELF-type = EXEC executable.) — Peter Cordes, Jun 14 '20 at 21:50
`call c_function` / `ret` violates the ABI: at that point the stack isn't aligned by 16. Use `jmp c_function` to tail-call it safely, or adjust RSP around the `call`. — Peter Cordes, Jun 14 '20 at 21:52
`-nostdlib` implies `-nostartfiles`. (You can think of the CRT startupf files as part of libc). Also, `-march=x86-64 -mtune=generic -m64` are the defaults. Not a bad thing to point out that `-march` and `-mtune` exist, though, I guess. — Peter Cordes, Jun 14 '20 at 23:47
In general it's great that you put this much effort into writing a tutorial and I'd like to upvote it. Let me know when the bugs are fixed (at least the possible correctness problems: `"memory"` clobber for [How can I indicate that the memory \*pointed\* to by an inline ASM argument may be used?](https://stackoverflow.com/q/56432259), and stack alignment before `call`) — Peter Cordes, Jun 14 '20 at 23:49
Also, don't `#define SYS_write` yourself. `#include ` for `__NR_write`, or `sys/syscall.h` to also define `SYS_write`, the same macro you were using. — Peter Cordes, Jun 14 '20 at 23:53
I'd recommend `-g` as a gcc option, so you can more easily find stuff with a debugger. Also, for asm debugging, `-fno-pie -no-pie` tends to be easier: absolute addresses at link time means disassembly has them. Also `-static` will override `-pie`. You can use `-static-pie` if you want ASLR but no ELF interpreter. [What's the difference between "statically linked" and "not a dynamic executable" from Linux ldd?](https://stackoverflow.com/q/61553723). Linking `-fPIE` code into a static executable is only slightly less efficient than `-fno-pie` for 64-bit mode, but it is less efficient. — Peter Cordes, Jun 14 '20 at 23:58
I really appreciate the effort you put into your answer, especially explaining why you put in all the little command switches, etc and what they do. It helps a lot. I still don't understand -pie, -fpic, -ffreestanding, and the memory cobber issues being discussed so I'm afraid I can't help in the above discussion. I'll probably have to do some additional digging to understand them. I'd ask you to further explain, but I feel a little guilty. It always amazes me how much I don't know. I'll wait until tomorrow evening to mark your answer as correct so that any addendums can be made if needed. — Echelon X-Ray, Jun 15 '20 at 04:39
@PeterCordes: Thank you! I agree with your points, and tried to apply them (or explain why I'd like to keep the tautological options). I've checked your answers here, and trust your opinion and expertise, so if you do find an error or omission, please edit this answer directly. (If it is a matter of suggestion or choice, then a comment or separate answer or suggestion might be better :). — Example, Jun 15 '20 at 13:17
@EchelonX-Ray: `-pie` and `-fpic` and related options are a side issue, related to shared libraries and address space layout randomization (ASLR), and not something a normal executable should worry about. I've added some further explanations or reasoning why the options are used (or now omitted). If you'd like some point expanded, then please do say so; others are probably wondering at the same thing. I've done only a little bit of this, so if Peter Cordes points out something, I believe they have much more experience with this than I, so please do trust them over me :). — Example, Jun 15 '20 at 13:21
I made some tweaks, including explaining some history to answer @EchelonX-Ray's specific question about why the `ld` default path for the ELF interpreter doesn't work. But also, CFI in this context doesn't stand for Control Flow Integrity. That's an unrelated thing with the same acronym. — Peter Cordes, Jun 15 '20 at 14:17
@PeterCordes: Thanks! Right, "Call Frame Information", not "Control Flow Integrity"! :) — Example, Jun 15 '20 at 15:23
@Example I had a busy week, but I'm working with the code you gave me now. I was wondering if the "cc" clobber should be used on the syscalls as well? Per the comment documentation on line 69 here https://github.com/torvalds/linux/blob/master/arch/x86/entry/entry_64.S , there is mention of the syscalls clearing the rflags. — Echelon X-Ray, Jun 20 '20 at 15:04

Peter Cordes · Answer 2 · 2020-06-15T15:46:36.300

The ld default path for ld.so (the ELF interpreter) isn't the one used on modern x86-64 GNU/Linux systems.

/lib/ld64.so.1 might have been used on early x86-64 GNU/Linux ports before the dust settled on where multi-arch systems would put everything to support both i386 and x86-64 versions of libraries installed at the same time. Modern systems use /lib64/ld-linux-x86-64.so.2.

There was never a good time to update the default in GNU binutils ld; when some systems were using the default, changing it would have broken them. Multi-arch systems had to configure their GCC to pass -dynamic-linker /some/path to ld, so they simply did that instead of asking and waiting for the ld default to change. So nobody ever needed the ld default to change to make anything work, except for people playing around with assembly and using ld by hand to create dynamically-linked executables.

Instead of doing that, you can link using gcc -nostartfiles to omit CRT start code which defines a _start, but still link with the normal libraries including -lc, -lgcc internal helper functions if needed, etc.

See also Assembling 32-bit binaries on a 64-bit system (GNU toolchain) for more info on assembling with/without libc for asm that defines _start, or with libc + CRT for asm that defines main. (Leave out the -m32 from that answer for 64-bit; when using gcc to invoke as and ld for you, that's the only difference.)

ld -static -e my_entry_pt -lc ./callee.obj ./caller.obj -o ./prog.out
doesn't link because you put -lc before the object files that reference symbols in libc.

Order matters in linker command lines, for static libraries.

However, ld -static -e my_entry_pt ./callee.o ./caller.o -lc -o ./prog.out will link, but makes a program that segfaults when it calls glibc functions like write without having called glibc's init functions.

Dynamic linking takes care of that for you (glibc has .init functions that get called by the dynamic linker, the same mechanism that allows C++ static initializers to run in a C++ shared library). CRT startup code also calls those functions in the right order, but you left that out, too, and wrote your own entry point.

@Example's answer avoids that problem by defining its own write wrapper instead of linking with -lc, so it can be truly freestanding.

I thought glibc's write wrapper function would be simple enough not to crash, but that's not the case. It checks if the program is multi-threaded or something by loading from %fs:0x18. The kernel doesn't init FS base for thread-local storage; that's something user-space (glibc's internal init functions) would have to do.

glibc's write() faults on mov %fs:0x18,%eax if you haven't called glibc's init functions. (In a statically-linked executable where glibc couldn't get the dynamic linker to run them for you.)

Dump of assembler code for function write:
=> 0x0000000000401040 <+0>:     endbr64                 # for CET, or NOP on CPUs without CET
   0x0000000000401044 <+4>:     mov    %fs:0x18,%eax    ### this faults with no TLS setup
   0x000000000040104c <+12>:    test   %eax,%eax
   0x000000000040104e <+14>:    jne    0x401060 <write+32>
   0x0000000000401050 <+16>:    mov    $0x1,%eax        # simple case: EAX = __NR_write
   0x0000000000401055 <+21>:    syscall 
   0x0000000000401057 <+23>:    cmp    $0xfffffffffffff000,%rax
   0x000000000040105d <+29>:    ja     0x4010b0 <write+112>        # update errno on error
   0x000000000040105f <+31>:    retq                               # else return

   0x0000000000401060 <+32>:    sub    $0x28,%rsp               # the non-simple case:
   0x0000000000401064 <+36>:    mov    %rdx,0x18(%rsp)          # write is an async cancellation point or something
   0x0000000000401069 <+41>:    mov    %rsi,0x10(%rsp)
   0x000000000040106e <+46>:    mov    %edi,0x8(%rsp)
   0x0000000000401072 <+50>:    callq  0x4010e0 <__libc_enable_asynccancel>
   0x0000000000401077 <+55>:    mov    0x18(%rsp),%rdx
   0x000000000040107c <+60>:    mov    0x10(%rsp),%rsi
   0x0000000000401081 <+65>:    mov    %eax,%r8d
   0x0000000000401084 <+68>:    mov    0x8(%rsp),%edi
   0x0000000000401088 <+72>:    mov    $0x1,%eax
   0x000000000040108d <+77>:    syscall 
   0x000000000040108f <+79>:    cmp    $0xfffffffffffff000,%rax
   0x0000000000401095 <+85>:    ja     0x4010c4 <write+132>
   0x0000000000401097 <+87>:    mov    %r8d,%edi
   0x000000000040109a <+90>:    mov    %rax,0x8(%rsp)
   0x000000000040109f <+95>:    callq  0x401140 <__libc_disable_asynccancel>
   0x00000000004010a4 <+100>:   mov    0x8(%rsp),%rax
   0x00000000004010a9 <+105>:   add    $0x28,%rsp
   0x00000000004010ad <+109>:   retq   
   0x00000000004010ae <+110>:   xchg   %ax,%ax

   0x00000000004010b0 <+112>:   mov    $0xfffffffffffffffc,%rdx   # errno update for the simple case
   0x00000000004010b7 <+119>:   neg    %eax
   0x00000000004010b9 <+121>:   mov    %eax,%fs:(%rdx)          # thread-local errno?
   0x00000000004010bc <+124>:   mov    $0xffffffffffffffff,%rax
   0x00000000004010c3 <+131>:   retq

   0x00000000004010c4 <+132>:   mov    $0xfffffffffffffffc,%rdx   # same for the async case
   0x00000000004010cb <+139>:   neg    %eax
   0x00000000004010cd <+141>:   mov    %eax,%fs:(%rdx)
   0x00000000004010d0 <+144>:   mov    $0xffffffffffffffff,%rax
   0x00000000004010d7 <+151>:   jmp    0x401097 <write+87>

I don't fully understand what exactly write is checking for or doing. It may have something to do with async I/O, and/or POSIX thread cancellation points.

@JosephSible-ReinstateMonica: thanks, good catch. I was mixing up my br vs. bnd0 or something, should have googled instead of taking a guess. I haven't done much of anything with CET or MPX. — Peter Cordes, Jun 15 '20 at 15:03

How do you call C functions from Assembly and how do you link it Statically?

2 Answers2