12

I'm interested in building a static ELF program without (g)libc, using unistd.h provided by the Linux headers.

I've read through these articles/question which give a rough idea of what I'm trying to do, but not quite: http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html

Compiling without libc

https://blogs.oracle.com/ksplice/entry/hello_from_a_libc_free

I have basic code which depends only on unistd.h, of which, my understanding is that each of those functions are provided by the kernel, and that libc should not be needed. Here's the path I've taken that seems the most promising:

    $ gcc -I /usr/include/asm/ -nostdlib grabbytes.c -o grabbytesstatic
    /usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000400144
    /tmp/ccn1mSkn.o: In function `main':
    grabbytes.c:(.text+0x38): undefined reference to `open'
    grabbytes.c:(.text+0x64): undefined reference to `lseek'
    grabbytes.c:(.text+0x8f): undefined reference to `lseek'
    grabbytes.c:(.text+0xaa): undefined reference to `read'
    grabbytes.c:(.text+0xc5): undefined reference to `write'
    grabbytes.c:(.text+0xe0): undefined reference to `read'
    collect2: error: ld returned 1 exit status

Before this, I had to manually define SEEK_END and SEEK_SET according to the values found in the kernel headers. Else it would error saying that those were not defined, which makes sense.

I imagine that I need to link into an unstripped vmlinux to provide the symbols to utilize. However, I read through the symbols and while there were plenty of llseeks, they were not llseek verbatim.

So my question can go in a few directions:

How can I specify an ELF file to utilize symbols from? And I'm guessing if/how that's possible, the symbols won't match up. If this is correct, is there an existing header file which will redefine llseek and default_llseek or whatever is exactly in the kernel?

Is there a better way to write Posix code in C without a libc?

My goal is to write or port fairly standard C code using (perhaps solely) unistd.h and invoke it without libc. I'm probably okay without a few unistd functions, and am not sure which ones exist "purely" as kernel calls or not. I love assembly, but that's not my goal here. Hoping to stay as strictly C as possible (I'm fine with a few external assembly files if I have to), to allow for a libc-less static system at some point.

Thank you for reading!

Community
  • 1
  • 1
sega01
  • 223
  • 2
  • 5
  • I initially thought you wanted to use this static binary from userspace (in which case the answer is that you need syscall wrappers if you want to use syscalls, either from libc or else write your own). But then you mentioned linking against the (unstripped) kernel so I guess you expect to run this code directly on bare metal (i.e. instead of the kernel). Please clarify your question on this point. – Celada Jan 18 '13 at 21:33
  • Thanks for replying! I meant to link using the kernel as a symbol table reference and run it in the userland of the Linux host. I'll search for existing syscall wrappers and see if one is similar to what I'm trying to do. – sega01 Jan 18 '13 at 22:01
  • OK, well if you intend to run it in userspace then you can't link against the kernel (if successful, that would pull the in-kernel implementations of those system calls into your code, which is not what you want: you want to call into the kernel). You must implement `open()` and `read()` yourself by invoking the proper actions as specified by the kernel ABI, which usually involves setting up registers and then executing some kind of CPU trap instruction. The problem is that the details of this are EXTREMELY architecture-specific (ARM vs. x86, etc...) and complicated by things like vsyscalls. – Celada Jan 18 '13 at 22:23
  • 1
    I don't see the point of doing this. Use libc - there is not much overhead if you are not using complicated functions - use -static, and you will get a binary that contains only the functions you want. What exactly is the purpose of not using libc? Note that you can't call the kernel from usermode without some sort of syscall wrapping - as you do need the appropriate calling method to transition from user-mode to kernel mode - this can not be done in pure C, needs to be written in assembler for the appropriate processor [and is subject to change if the kernel changes]. – Mats Petersson Jan 18 '13 at 22:48
  • I guess that the ideal scenario would be inline assembly header files. I found this [question](http://stackoverflow.com/questions/7064805/how-to-compile-c-program-so-that-it-doesnt-depend-on-any-library) and it produces a result which is almost what I want, but I can't get argc/argv working with void _start(). @MatsPetersson: There's a lot of overhead with glibc, it results in 800KB or larger files, even if only using unistd.h. As far as I know, everything in my code is just syscalls, so I don't see why I can't simply have gcc generate code calling those directly through the Linux headers. – sega01 Jan 18 '13 at 22:55
  • If you get 800kb from linking glibc, something is being dragged in that isn't necessary - or you are doing somethig wrong. – Mats Petersson Jan 18 '13 at 23:10
  • So, I just did some experiments, and I believe it's mainly the C startup code that drags in a huge amount of other code. So if you can get rid of (some of) that, then you should be able to reduce the code. I will work at it for a bit and get back to you. The alternative is drag out the parts of glibc that you want and link with those... [But you still need to get rid of any code that calls any other functions, like the C startup code, which calls a large number of other functions, that drag in the kitchen sink] – Mats Petersson Jan 18 '13 at 23:29

2 Answers2

6

If you're looking to write POSIX code in C, the abandonment of libc is not going to be helpful. Although you could implement a syscall function in assembler, and copy structures and defines from the kernel header, you would essentially be writing your own libc, which almost certainly would not be POSIX compliant. With all the great libc implementations out there, there's almost no reason to begin implementing your own.

dietlibc and musl libc are both frugal libc implementations which yield impressively small binaries The linker is generally smart; as long as a library is written to avoid the accidentally pulling in numerous dependencies, only the functions you use will actually be linked into your program.

Here is a simple hello world program:

#include<unistd.h>

int main(){
    char str[] = "Hello, World!\n";
    write(1, str, sizeof str - 1);
    return 0;
}

Compiling it with musl below yeilds a binary of a less than 3K

$ musl-gcc -Os -static hello.c
$ strip a.out 
$ wc -c a.out
2800 a.out

dietlibc produces an even smaller binary, less than 1.5K:

$ diet -Os gcc hello.c
$ strip a.out 
$ wc -c a.out
1360 a.out
Dave
  • 10,964
  • 3
  • 32
  • 54
4

This is far from ideal, but a little bit of (x86_64) assembler has me down to just under 5KB (but most of that is "other things than code" - the actual code is under 1KB [771 bytes to be precise], but the file size is much larger, I think because the code size is rounded to 4KB, and then some header/footer/extra stuff is added to that]

Here's what I did: gcc -g -static -nostdlib -o glibc start.s glibc.c -Os -lc

glibc.c contains:

#include <unistd.h>

int main()
{
    const char str[] = "Hello, World!\n";
    write(1, str, sizeof(str));

    _exit(0);
}

start.s contains:

    .globl _start
_start: 
    xor %ebp, %ebp
    mov %rdx, %r9
    mov %rsp, %rdx
    and $~16, %rsp
    push    $0
    push    %rsp

    call    main

    hlt


    .globl _exit
_exit:
    //  We known %RDI already has the exit code... 
    mov $0x3c, %eax
    syscall
    hlt

That main point of this is not to show that it's not the system call part of glibc that takes up a lot of space, but the "prepar things" - and beware that if you were to call for example printf, possibly even (v)sprintf, or exit(), or any other "standard library" function, you are in the land of "nobody knows what will happen".

Edit: Updated "start.s" to put argc/argv in the right places:

_start: 
    xor %ebp, %ebp
    mov %rdx, %r9
    pop     %rdi
    mov %rsp, %rsi
    and $~16, %rsp
    push    %rax
    push    %rsp

    // %rdi = argc, %rsi=argv
    call    main

Note that I've changed which register contains what thing, so that it matches main - I had them slightly wrong order in the previous code.

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
  • Thank you! Your solution is quite close to this [one](http://stackoverflow.com/questions/7064805/how-to-compile-c-program-so-that-it-doesnt-depend-on-any-library). I can confirm that this works on my end, however argc/argv passing does not work. Do you know of a good resource that I can look through for having argc/argv support in the start.s assembly section? I'm not familiar with how argc/argv works exactly. – sega01 Jan 19 '13 at 00:46
  • I don't know, but I'm fairly sure argc/argv are passed from the kernel in registers. I'll have a dig. – Mats Petersson Jan 19 '13 at 00:48
  • I've edited in a new "_start" function. Don't ask me how you do "environ" tho', I'm not sure that's as easy. – Mats Petersson Jan 19 '13 at 01:20
  • Thank you so much for your help, Mats! That works perfectly. Interestingly, it segfaults if open() is called on a non-existant file (dynamic version does not), but that is a task for another day. – sega01 Jan 19 '13 at 03:45
  • Highly likely that the glibc is using some uninitialized variables or some such in that case. Like I tried to say, this is not something I would actually recommend doing... – Mats Petersson Jan 19 '13 at 12:46
  • -1: This absolutely brings in glibc with `-lc`. The solution to OP's problem is not to break glibc by not allowing it to initialize itself in a quest for smaller binary size. Instead an embedded libc like musl or dietlibc should be used. – Dave Apr 29 '13 at 16:57
  • @Dave: Did I even say it doesn't bring in glibc? And I think at least the commment above yours explains that this is a bad thing to do. My point was that "it's the other stuff in glibc that causes the fat, not the system calls". – Mats Petersson Apr 29 '13 at 17:30
  • My point is that even `write()` is a "standard library" function and that not linking `_start()` is pretty much indefensible. And, if your binary is still 5K, then glibc's `write()` is absolutely bloated. – Dave Apr 29 '13 at 17:51
  • Did you not read the next sentence or two: "the actual code is under 1KB [771 bytes to be precise]". The rest is various other padding/symbols and stuff... I never did strip the binary, for example. I agree that "not linking _start" is a bad idea. And I said as much. Using a "light-weight" C library is a good solution, I'm not arguing that. – Mats Petersson Apr 29 '13 at 17:59
  • I did, but I also read the hand-waving you used to justify the 5K filesize. The dietlibc file, properly initialized and unstripped comes to 2.7K. I do find it interesting to see exactly how much glibc bloat is actually from the startup. I think that what you do could be more explicit from the start, and that `-nostartfiles ` should be used rather than `-nostdlib -lc` – Dave Apr 29 '13 at 18:15
  • So, using `-nostartfiles` with the "start.s" from above (as you need a `_start` somewhere), after `strip a.out`, with `-nostartfiles` the file is 11584 bytes. That's 608 bytes of "text" and 12 bytes of "data", a total of 620 bytes. – Mats Petersson Apr 29 '13 at 22:45
  • The envp array starts after the argv array. Once you have argc and argv loaded in rdi and rsi, 'lea 8(%rsi, %rdi, 8), %rdx' should load envp into rdx. [example](https://github.com/eloj/nolibc-example) – eloj Sep 20 '15 at 03:56