What does the POSIX standard say about thread stacks in atexit() handlers? What's the OS practice?

Question

When our UNIX/C program needs an emergency exit, we use exit (3) function and install atexit (3) handlers for emergency clean-ups. This approach worked fine until our application got threaded, at which point atexit() handlers stopped to work predictably.

We learned by trial an error that threads may already be dead in atexit() handler, and their stacks deallocated.

I failed to find a quote in the standard linking thread disappearance with atexit(): threads cease to exist after return from main(), but is it before invocation of atexit() or after? What's the actual practice on Linux, FreeBSD and Mac?

Is there a good pattern for emergency cleanup in a multi-threaded program?

score 5 · Answer 1 · answered Oct 05 '16 at 10:42

Posix Standard

It doesn't seem to be defined by Posix whether atexit handlers are called before or after threads are terminated by exit.

There are two (or three) ways for a process to terminate "normally".

All threads terminate. When the last thread exits, either by returning or calling pthread_exit, atexit handlers are run. In this case there are no other threads. (This is platform dependent. Some platforms may terminate other threads if the main thread terminates other than by exit, others do not).
One thread calls exit. In this case, atexit handlers will be run and all threads terminated. Posix doesn't specify in what order.
main returns. This is more-or-less equivalent to calling exit() as the last line of main, so can be treated as above.

OS Practice

In Linux, the documentation https://linux.die.net/man/2/exit says threads are terminated by _exit calling exit_group, and that _exit is called after atexit handlers. Therefore in Linux on calling exit any atexit handlers are run before threads are terminated. Note that they are run on the thread calling exit, not the thread that called atexit.

On Windows the behaviour is the same, if you care.

Patterns for emergency cleanup.

The best pattern is: Never be in a state which requires emergency cleanup.

There is no guarantee that your cleanup will run because you could have a kill -9 or a power outage.
Therefore you need to be able to recover in that scenario.
If you can recover from a that, you can also recover from abort, so you can use abort for your emergency exit.

If you can't do that, or if you have "nice-to-have" cleanup you want to do, atexit handlers should be fine provided you first gracefully stop all threads in the process to prevent entering an inconsistent state while doing cleanup.

I think this use case should be noted to the POSIX/ISO committee for discussion. That said, I agree that you never *want* to be in a place where you are in a state where you need to clean up, but I have found myself in such a place a time or two -- most recently if my TAP compliant unit/regression test code raises an exception, then I either get not message output, or I have to handle atexit. Not having a rigorous definition of atexit in the case of multi-threaded could be problematic. — EBo, Feb 22 '18 at 13:25
Ben, thank you very much for your answer. Re never being in a state that requires an emergency cleanup, I'm a database kernel engineer and the issue arose since a database needs to properly close its write ahead log to speed up recovery. I agree with EBo, POSIX should define this semantics. — Kostja, May 01 '19 at 06:06
@Kostja if your program has crashed, you don't know that the actions "to properly close its write ahead log" won't make things worse. It crashed for a reason. All your data structures may be trashed, and you may be just overwriting the on-disk information you need to recover. — Ben, May 01 '19 at 07:52

What does the POSIX standard say about thread stacks in atexit() handlers? What's the OS practice?

1 Answers1

Linked