Getting printf in assembly with only system calls?

Question

I am looking to understand the printf() statement at the assembly level. However most of the assembly programs do something like call an external print function whose dependency is met by some other object file that the linker adds on. I would like to know what is inside that print function in terms of system calls and very basic assembly code. I want a piece of assembly code where the only external calls are the system calls, for printf. I'm thinking of something like a de assembled object file. Where can I get something like that??

In most cases, the printf() code just calls an OS write() via putchar(). — Martin James, Jan 23 '16 at 13:26
But how can I possibly disassemble something on my linux and see how it is done...??? So putchar() is just one write()?? — user2277550, Jan 23 '16 at 13:38

score 5 · Accepted Answer · edited May 23 '17 at 12:23

I would suggest instead to stay first at the C level, and study the source code of some existing C standard library free software implementation on Linux. Look into the source code of musl-libc or of GNU libc (a.k.a. glibc). You'll understand that several intermediate (usually internal) functions are useful between printf and the basic system calls (listed in syscalls(2) ...). Use also strace(1) on a sample C program doing printf (e.g. the usual hello-world example).

In particular, musl-libc has a very readable stdio/printf.c implementation, but you'll need to follow several other C functions there before reaching the write(2) syscall. Notice that some buffering is involved. See also setvbuf(3) & fflush(3). Several answers (e.g. this and that one) explain the chain between functions like printf and system calls (up to kernel code).

I want a piece of assembly code where the only external calls are the system calls, for printf

If you want exactly that, you might start from musl-libc's stdio/printf.c, add any additional source file from musl-libc till you have no more external undefined symbols, and compile all of them with gcc -flto -O2 and perhaps also -S, you probably will finish with a significant part of musl-libc in object (or assembly) form (because printf may call malloc and many other functions!)... I'm not sure it is worth the pain.

You could also statically link your libc (e.g. libc.a). Then the linker will link only the static library members needed by printf (and any other function you are calling).

To be picky, system calls are not actually external calls (your libc write function is actually a tiny wrapper around the raw system call). You could make them using SYSENTER machine instructions (but using vdso(7) is preferable: more portable, and perhaps quicker), and you don't even need a valid stack pointer (on x86_64) to make a system call.

You can write Linux user-level programs without even using the libc; the bones implementation of Scheme is such a program (and you'll find others).

e0k · Answer 2 · 2016-01-23T14:20:36.663

The function printf() is in the standard C library, so it is linked into your program and not copied into it. Dynamically linked libraries save memory because you don't have the exact same code copied in resident memory for every program that uses it.

Think about what printf() does. Interpreting the formatted string and generating the correct output is fairly complex. The series of functions that printf() belongs to also buffers the output. You probably don't really want to re-implement all of this in assembly. The standard C library is omnipresent, and probably available for you.

Maybe you're looking for write(2), which is the system call for unbuffered writes of just bytes to a file descriptor. You'd have to generate the string to print beforehand and format it yourself. (See also open(2) for opening files.)

To disassemble a binary, you can use objdump:

    objdump -d binary

where binary is some compiled binary. This gives opcodes and human readable instructions. You probably want to redirect to a file and read elsewhere.

You can disassemble the standard C binary on your system and try to interpret it if you want (strongly not recommended). The problem is that it will be far too complex to understand. Things like printf() were written in C, then compiled and assembled. You can't (within a reasonable number of decades) restore the high level structure from the assembly of a compiled (non-trivial) program. If you really want to try this, good luck.

An easier thing to do is to look at the C source code for printf() itself. The real work is actually done in vfprintf() which is in stdio-common/vfprintf.c of the GNU C library source code.

I dont want to reimplement it and I know why it is done that way. I just want to see how it is implemented within.... — user2277550, Jan 23 '16 at 13:37
*"is in the standard C library, so it is linked into your program and not copied into it"* is not necessarily true. Many older systems link the entire runtime library in to build the executable file. Shareable, dynamically linked libraries are a relatively recent innovation. Since the OP doesn't specify any particular environment, it is inaccurate to assume a modern efficient environment. — wallyk, Jan 23 '16 at 15:01

Getting printf in assembly with only system calls?

2 Answers2