Why do I get a zombie when I link assembly code without stdlib?

Question

I was experimenting with assembly code and the GTK+ 3 libraries when I discovered that my application turns into a zombie if I don't link the object file with gcc against the standard library. Here is my code for the stdlib-free application

%include "gtk.inc"
%include "glib.inc"

global _start

SECTION .data    
destroy         db "destroy", 0     ; const gchar*
strWindow       db "Window", 0              ; const gchar*

SECTION .bss    
window         resq 1 ; GtkWindow *

SECTION .text    
_start:
    ; gtk_init (&argc, &argv);
    xor     rdi, rdi
    xor     rsi, rsi
    call    gtk_init

    ; window = gtk_window_new (GTK_WINDOW_TOPLEVEL);
    xor     rdi, rdi
    call    gtk_window_new
    mov     [window], rax

    ; gtk_window_set_title (GTK_WINDOW (window), "Window");
    mov     rdi, rax
    mov     rsi, strWindow
    call    gtk_window_set_title

    ; g_signal_connect (window, "destroy", G_CALLBACK (gtk_main_quit), NULL);
    mov     rdi, [window]
    mov     rsi, destroy
    mov     rdx, gtk_main_quit
    xor     rcx, rcx
    xor     r8, r8
    xor     r9, r9
    call    g_signal_connect_data

    ; gtk_widget_show (window);
    mov     rdi, [window]
    call    gtk_widget_show

    ; gtk_main ();
    call    gtk_main

    mov     rax, 60 ; SYS_EXIT
    xor     rdi, rdi
    syscall

And here is the same code meant to be linked against the standard library

%include "gtk.inc"
%include "glib.inc"

global main

SECTION .data    
destroy         db "destroy", 0     ; const gchar*
strWindow       db "Window", 0              ; const gchar*

SECTION .bss
window         resq 1 ; GtkWindow *

SECTION .text    
main:
    push    rbp
    mov     rbp, rsp

    ; gtk_init (&argc, &argv);
    xor     rdi, rdi
    xor     rsi, rsi
    call    gtk_init

    ; window = gtk_window_new (GTK_WINDOW_TOPLEVEL);
    xor     rdi, rdi
    call    gtk_window_new
    mov     [window], rax

    ; gtk_window_set_title (GTK_WINDOW (window), "Window");
    mov     rdi, rax
    mov     rsi, strWindow
    call    gtk_window_set_title

    ; g_signal_connect (window, "destroy", G_CALLBACK (gtk_main_quit), NULL);
    mov     rdi, [window]
    mov     rsi, destroy
    mov     rdx, gtk_main_quit
    xor     rcx, rcx
    xor     r8, r8
    xor     r9, r9
    call    g_signal_connect_data

    ; gtk_widget_show (window);
    mov     rdi, [window]
    call    gtk_widget_show

    ; gtk_main ();
    call    gtk_main

    pop     rbp
    ret

Both applications create a GtkWindow. However, the two behave differently when the window is closed. The former leads to a zombie process and I need to press Ctrl+C. The latter exhibits the expected behaviour, i.e. the application terminates as soon as the window is closed.

My feeling is that the standard lib is performing some essential operations that I am neglecting in the first code sample, but I can't tell what it is.

So my question is: what's missing in the first code sample?

`mov rax, 60 ; SYS_EXIT xor rdi, rdi syscall` short circuits the normal shutdown procedure. Since you don't show how you assemble/link and this isn't a minimal verifiable example (headers you use aren't part of question) it is hard to say. One possibility is that _C_ library `exit` needs to be called. — Michael Petch, Jul 18 '16 at 15:41
Even without a call to sys_exit I still end up with a zombie. The %included files are trivial to generate: they just include the extern declarations for external symbols. I have used "nasm -f elf64 ..." and "ld -I/path/to/interpreter `pkg-config --libs gtk+-3.0` ...". I suspect that I need to call something else, which probably is some sort of process clean-up done by C's exit. However I'm not sure what this clean-up is. I tried to include a sys_wait4 call with -1 as pid, but even with that I still get a zombie. — Phoenix87, Jul 18 '16 at 15:46
If you just remove `mov rax, 60 ; SYS_EXIT xor rdi, rdi syscall` that won't work because then your program will likely segfault walking random memory. The _C_ library `exit` function I mentioned usually performs cleanup that may be necessary (which may set your programs apart) for GTK to shut down properly (It will also do some thread related cleanup as I recall). `gtk.inc` and the other inc may be trivial to generate, but if you want anyone to take this seriously (or try your code out) then you may wish to provide them. Without them this is not a minimal complete verifiable example. — Michael Petch, Jul 18 '16 at 15:51
Circumventing the _C_ runtime can lead to problems like this. Either something isn't being initialized or cleaned up. I wouldn't be surprised if it was thread related. A greater question is why yo want to circumvent the _C_ runtime? — Michael Petch, Jul 18 '16 at 15:54
The reason why I want to not include C libraries is because I don't see why they are necessary. I'm writing a Linux application that only needs to call APIs from the Gtk+ 3 libraries. — Phoenix87, Jul 18 '16 at 15:58
@MichaelPetch: Threads would perfectly explain all the observed symptoms. Thanks for the idea. — Peter Cordes, Jul 19 '16 at 04:45

score 3 · Accepted Answer · edited May 23 '17 at 11:44

Thanks @MichaelPetch for this idea which explains all the observed symptoms perfectly:

If gtk_main leaves any threads running when it returns, the most important difference between your two programs is that eax=60/syscall only exits the current thread. See the documentation in the _exit(2) man page, which points out that glibc's _exit() wrapper function has used exit_group since glibc2.3.

exit_group(2) is eax=231 / syscall in the x86-64 ABI. This is what the CRT startup/cleanup code runs when main() returns.

You can see this by using strace ./a.out on both versions.

This surprised me at least: A process where the initial thread has exited, but other threads are still running, is shown as a zombie. I tried it on my own desktop (see the end of this answer for build commands and extern declarations so you don't need gtk.inc), and you really do get a process that's reported as a zombie, but that you can ctrl-c to kill the other threads that gtk leaves running when gtk_main returns.

./thread-exit &   # or in the foreground, and do the following commands in another shell
[1] 20592

$ ps m -LF -p $(pidof thread-exit)
UID        PID  PPID   LWP  C NLWP    SZ   RSS PSR STIME TTY      STAT   TIME CMD
peter    20592  7749     -  0    3 109031 21920  - 06:28 pts/12   -      0:00 ./thread-exit
peter        -     - 20592  0    -     -     -   0 06:28 -        Sl     0:00 -
peter        -     - 20593  0    -     -     -   0 06:28 -        Sl     0:00 -
peter        -     - 20594  0    -     -     -   0 06:28 -        Sl     0:00 -

Then close the window: the process doesn't exit, and still has two threads running + 1 zombie.

$ ps m -LF -p $(pidof thread-exit)
UID        PID  PPID   LWP  C NLWP    SZ   RSS PSR STIME TTY      STAT   TIME CMD
peter    20592  7749     -  0    3     0     0   - 06:28 pts/12   -      0:00 [thread-exit] <defunct>
peter        -     - 20592  0    -     -     -   0 06:28 -        Zl     0:00 -
peter        -     - 20593  0    -     -     -   0 06:28 -        Sl     0:00 -
peter        -     - 20594  0    -     -     -   0 06:28 -        Sl     0:00 -

I'm not sure if ps m -LF is the best command for this, but it seems to work. It indicates that only the main thread has exited after you close the window, and 2 other threads are still running. You can even look at /proc/$(pidof thread-exit)/task directly, instead of using ps to do that for you.

re: comments about not wanting to link libc:

Avoiding the glibc's CRT startup / cleanup (by defining _start instead of _main) isn't the same thing as avoiding libc. Your code doesn't call any libc functions directly, but libgtk does. ldd /usr/lib/x86_64-linux-gnu/libgtk-3.so.0 shows that libgtk depends on libc, so the dynamic linker will map libc into your process anyway. In fact, ldd on your own program says that, even if you don't put -lc on the linker command line directly.

So you could just link libc and call exit(3) from your _start.

See this Q&A for info on building static vs. dynamic binaries that link libc or not and define _start or main, with NASM or gas.

Side-note: the version that defines main doesn't need to make a stack frame with rbp.

If you leave out the push rbp / mov rbp, rsp, you still have to do something to align the stack before the call, but it can be push rax, or still push rbp if you want to be confusing. So:

main:
    push    rax              ; align the stack
    ...
    call    gtk_widget_show

    pop     rax              ; restore stack to function-entry state
    jmp     gtk_main         ; optimized tail-call

If you want to keep the frame-pointer stuff, you could still do the tail call, but pop rbp / jmp gtk_main.

PS: for those who want to try it themselves, this change lets you build it without having to go looking for for a gtk.inc:

;%include "gtk.inc"
;%include "glib.inc"

extern gtk_init
extern gtk_window_new
extern g_signal_connect_data
extern gtk_window_set_title
extern gtk_widget_show
extern gtk_main
extern gtk_main_quit

Build with:

yasm -felf64 -Worphan-labels -gdwarf2 thread-exit.asm &&
gcc -nostdlib -o thread-exit thread-exit.o $(pkg-config --libs gtk+-3.0)

Thank you for your answer, `exit_group` solves the processes issue. Regarding zombies, this is what the System Monitor was reporting after I closed the window. It made sense to me since I was terminating the parent process without waiting to listen to the children's return value. Anyway, I can see why one might want to prefer libc in this case. The question was in the spirit of trying to understand and learn where I was going wrong with the first code and get a deeper insight of the libc internal mechanics in this case. — Phoenix87, Jul 19 '16 at 08:16
@Phoenix87: It's a good question, I was just annoyed about your terminology. But maybe I should try it myself, if you say a process-viewer tool reported it as a zombie. I just assumed that it wouldn't show as a zombie if there were still threads running, but maybe it does if the initial PID has exited. (threads use the same numbering space as PIDs, i.e. sort of have their own PIDs, but getpid() returns the PID of the initial thread. So the "PID"s of the threads are actually thread-IDs. See /proc/*/task. — Peter Cordes, Jul 19 '16 at 08:21
Also, note that a process doesn't have to wait for itself. The shell waits for all its children, so a process that you run directly from the shell always gets reaped promptly when it exits. This is why it's weird to have a zombie process. That doesn't happen for normal single-threaded programs when you run them from bash. — Peter Cordes, Jul 19 '16 at 08:26
@Phoenix87: BTW, you can make your `main()` more efficient / smaller with the suggestion in my last edit. — Peter Cordes, Jul 19 '16 at 08:37
@Phoenix87: update: yes, `ps` does report it as a zombie. I updated my answer with output from `ps m -LF` showing the threads (Light Weight Processes) before and after closing the window. Sorry for the false accusation of sloppy terminology; now I know what happens when you sys_exit while other threads are running :). (Still, you could have shown `ps` output yourself in the question to back up this surprising claim, along with code that could be tested without having to find a `gtk.inc` somewhere, or manually adding `extern` declarations for all the gtk symbols it uses, like I did.) — Peter Cordes, Jul 19 '16 at 09:40
Thanks for checking that indeed the example I have provided gives a zombie. Also, many thanks for the comments on the TCO. I agree that this particular main does not require a stack frame and that said TCO can be performed at the end, but since there is no risk of ending up with a stack overflow in this case, imho one can think of not implementing it for the sake of clarity. Anyway it's a good observation nonetheless :). — Phoenix87, Jul 19 '16 at 16:12
@Phoenix87: Any time I see asm code that looks inefficient (like an unoptimized tailcall), it makes me wonder why it wasn't done. Like was there some reason we had to do something after the `call`? The main reason for the optimization in user-space code is reducing instruction / uop count, as well as code size. `call` is multiple uops on Intel, IIRC, but `jmp` is one, and we can lose multiple other instructions. (See the [x86 tag wiki](http://stackoverflow.com/tags/x86/info) for optimization links and more. — Peter Cordes, Jul 19 '16 at 17:10

Why do I get a zombie when I link assembly code without stdlib?

1 Answers1