3

I saw Getting a backtrace of other thread but it didn't contain a lot of practical information.

What I want is to be able to catch SIGSEGV in a C multi-threaded app using POSIX threads running on Linux (CentOS, 2.6 kernel), and print the stack trace of the thread that caused it. Of course, not knowing which thread caused it, it's Good Enough For Me (tm) that the main thread that caught the signal to enumerate over all the threads and just print the stack trace of all of them.

It was noted over there that perhaps libunwind can be used for this, but its documentation is rather lacking and I couldn't find a good example of how to go about using it for this purpose. Also, I wondered if it has any significant performance overhead or other impact, and whether it is battle-tested and used in production code, or if it's mostly only used in debugging and development, and not in production systems.

Does anyone have sample code using libunwind or another reasonably straightforward (like not writing it in assembly) way to do this?

Community
  • 1
  • 1
e.dan
  • 7,275
  • 1
  • 26
  • 29

1 Answers1

3

Getting the backtrace of the thread that caused the exception is easy, more or less:

Pass the -rdynamic flag to the linker

Then, in your coderegister signal handler, extract the EIP of the fault from the signal handler parameters and then use it and the backtrace() function to get an array of the addresses.

Find some way to pass the data in the array outside your app (to a different process over a pipe for exeample) and there you can use backtrace_symbols() to translate the backtrace to symbol names.

Make sure not to use any thread async non safe function in the signal handler, don't take any locks, allocate memory or call any function that does.

Here are the slides to a presentation I gave on the subject: http://www.scribd.com/doc/3726406/Crash-N-Burn-Writing-Linux-application-fault-handlers

The video is also available somewhere of the talk but I can't find it now...

Extending this to get the backtrace of multiple threads is possible but quite tricky - you need to keep tab of your various threads and send signals to them at the event of a crash

gby
  • 14,900
  • 40
  • 57
  • I saw these slides before and they are great, let me know when you find the video. – Karoly Horvath Aug 03 '11 at 08:09
  • @gby: Thanks for the info, but I am not 100% with you. Why is `-rdynamic` needed? Where is the code for these slides? Couldn't find it. Will it work on `x86_64` arch? How do I use the EIP with `backtrace()`? Why does it need to be done outside the process - only because I can't get the symbols in the sighandler? If we have other arch-specific code that walks the ELF symbol table in a sighandler, I can just use that instead, right? – e.dan Aug 03 '11 at 08:53
  • 1
    @e.dan. backtrace_symbols() uses the dynamic linker symbols to translate the addresses to symbol names. without -rdynamic the linker will only embed symbols for external libraries. with it, you'll get all non static symbols. the code is in https://github.com/gby/libcrash. It works great on x86_64. About EIP you'll have to read the slides :-). – gby Aug 03 '11 at 17:21
  • 1
    @ e.dan You need to do it outside the process because a signal handler of a process that just performed a segmentation fault is very limited place - you can't call any function that takes a lock or allocates memory. That means no printf and not backtrace_symbols for example. You can use your custom code in the signal handler if it does not need locks or any memory allocations – gby Aug 03 '11 at 17:21