4

I learned how assembly (x86) globally works in the book : "Programming from ground up". In this book, every program ends with an interruption call to exit.

However, in C compiled programs, I found out that programs end with a ret. This supposes that there is an address to be popped and that would lead to the end of the program.

So my question is : What is this address? (And what is the code there?)

DJ_Joe
  • 161
  • 1
  • 1
  • 5
  • 5
    They don't end with `ret`. The `main` ends with `ret`, but that's not whole code which is in the binary, there's runtime C library which has start-up / tear-down code, and that one calls `main` and does exit properly after. (you can check compiler switches to see all arguments during linking to see which libraries are added and where is the true entry point) – Ped7g Nov 16 '17 at 17:52
  • 2
    C programs are contained in a wrapper, which sets up the data etc, and the arguments supplied to `main(int argc, char *argv[])`. So the system call to exit is from the wrapper. – Weather Vane Nov 16 '17 at 17:52
  • 3
    There's some startup code that sets up the environment for the `main()` before calling it (things like initializing standard input, output, error file streams). The `ret` at the end of `main()` returns to that code, which typically does the equivalent of `exit(main(argc, argv, environ));`. (The `environ` is actually very common, but not mandated by the standard.) – Jonathan Leffler Nov 16 '17 at 17:52
  • Yeah - it's usually named 'crt', (C Run Time), and linked, one way or another, to your compiled code. When run on non-trivial OS, the crt will make a syscall/interrupt to terminate the process after main() returns. – Martin James Nov 16 '17 at 18:24
  • the code there is very specific to the operating system, and possibly version, as it is the operating system that loads the program into ram and "Calls" it from which the program returns with a ret to the operating system placed code that calls it. the book you read was likely based on dos programs where to get the "os" to clean up you called exit. but even there a compiler could have the exit in the bootstrap that the compiler provides such that main simply returns. compilers have to be tuned to their targets, the operating system as well as instruction set – old_timer Nov 16 '17 at 18:24
  • if you want to see the address simply read it off the stack and print it out. – old_timer Nov 16 '17 at 18:24

2 Answers2

4

You start your program by asking the OS to pass control to the start or _start function of your program by jumping to that label in your code. In a C program the start function comes from the C library and (as others already said before) does some platform specific environment initialization. Then the start function calls your main and the control is yours. After you return from the main, it passes control back to the C library that terminates the program properly and does the platform specific system call to return control back to the OS.

So the address main pops is a label coming from the C library. If you want to check it, it should be in stdlib.h (cstdlib) and you will see it calling exit that does the cleanup.

Its function is to destroy the static objects (C++ of course) at program termination or thread termination (C++11). In the C case it just closes the streams, flushes their buffers, calls atexit functions and does the system call.

I hope this is the answer you seek.

Corrosive
  • 71
  • 5
  • Actually `_start` comes from libc (in the sense that it is provided alongside libc), but is statically linked into your executable (along with the other CRT start code). So it *resides* in your executable, and you'll see it there if you disassemble (e.g. `objdump -drwC -Mintel a.out | less`) – Peter Cordes Nov 16 '17 at 19:00
  • You are completely right, it was just a wrong choice of the word. Thank you. – Corrosive Nov 16 '17 at 19:08
  • 1
    In glibc on x86-64 Linux, `__libc_start_main` uses a normal `call` instruction (`call rax`) and then separately calls `exit` with main's return value. `push OFFSET exit` / `jmp rax` wouldn't work, because `main` leaves its return value in `eax` but `exit()` looks for its arg in `edi`. Some ISAs (like ARM) use a calling convention where the return-value register is also the first arg-passing register, but none of the usual x86 calling conventions are like that so CRT needs a `mov edi,eax` before `call exit`. – Peter Cordes Nov 16 '17 at 19:14
  • 1
    re: work done by `exit`: it runs destructors for *static* objects only. It also runs any functions registered by `atexit`, so in a C program you can get destructor-like functionality. But that's mostly just nit-picking. – Peter Cordes Nov 16 '17 at 19:18
3

It is implementation specific.

On Linux, main is called by crt0, and the _start entry point there is analyzing the initial call stack set up by the kernel interpreting the execve(2) system call of your executable program. On return from main the epilogue part of crt0 is dealing with atexit(3) registered functions and flushing stdio.

FWIW, crt0 is provided by your GCC compiler, and perhaps your C standard library. All this (with the Linux kernel) is free software on Linux distribution.

every program ends with an interruption call to exit.

Not really. It is a system call (see syscalls(2) for their list), not an interrupt. See also this.

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547