1

As the question states, is it useful to always collect a software-based backtrace (like using libc backtrace http://www.gnu.org/software/libc/manual/html_node/Backtraces.html ) in all error functions and signal handlers ?

Would it not be very helpful for debugging a wide variety of bugs like memory, concurrency bugs etc. ? I guess it would not hurt normal performance as well as it will be triggered only in error paths.

user655617
  • 318
  • 3
  • 13

1 Answers1

2

As the question states, is it useful to always collect a software-based backtrace

Yes, it is generally very useful to have a crash stack trace when:

  • your code runs in your own environment, and you are not worried about the stack trace revealing any secrets.
  • when the crash handler does not further corrupt the coredump, does not hang, etc.

like using libc backtrace

glibc backtrace calls calloc under certain conditions, and is not safe in a crash handler. It can cause both the hang, and the further corruption mentioned above. Writing a crash handler that will reliably print stack trace in async-signal-safe manner is quite non-trivial.

why then do error functions in "standard" applications not call backtrace?

Consider cat /no/such/file. Currently it produces:

cat: /no/such/file: No such file or directory

which is all you really need to know. Making this print anything else is useless. If you had many such files, and cat printed a full stack trace for each, you'd get many pages of error output, and that would only make finding the real problem harder.

For fatal signal handlers (e.g. SIGSEGV) the answer is that most "standard" applications don't actually handle such signals, and simply use the default action, which produces a core dump.

But if they did catch the signal, calling backtrace, backtrace_symbols, or backtrace_symbols_fd from the signal handler would be equally unsafe, and could deadlock, which is much worse than simply dumping core. Consider what happens if you have a long-running script with a 1000 commands in it. You start it, and a week later discover that it didn't make any progress because the second command crashed and deadlocked trying to print the crash stack trace.

Employed Russian
  • 199,314
  • 34
  • 295
  • 362
  • Thanks for the information ! I was wondering why then do error functions in standard applications like Coreutils don't call backtrace by default ? – user655617 Jan 08 '13 at 22:13
  • @user655617 "why then do error functions ... don't call backtrace?" Do you mean error functions (e.g. "cat unreadable-file"), or do you mean crash functions? If the former, stack trace will likely just add noise. If the latter, I don't think you've understood "may cause a hang" part of the answer -- it's definitely much better to crash with no stack trace, then to (sometimes) hang indefinitely. – Employed Russian Jan 09 '13 at 02:10
  • Hmm.. For error functions: why will stack trace just add noise ? For crash functions : I understand your concern that backtrace() and backtrace_symbols() is not safe for calling within signal handler but can't we use backtrace_symbols_fd() to overcome this problem ? – user655617 Jan 09 '13 at 21:28
  • 1
    @user655617 I've updated the answer. `backtrace_symbols_fd` is *not* any safer than `backtrace`. What made you think otherwise? – Employed Russian Jan 09 '13 at 22:17
  • Well, I read that `backtrace_symbols_fd` does not call `malloc`, and so can be employed in situations where the latter function might fail [link](http://linux.die.net/man/3/backtrace_symbols_fd) and this was also mentioned at [link](http://stackoverflow.com/questions/77005/how-to-generate-a-stacktrace-when-my-gcc-c-app-crashes#comment15654678_77005).. So, why is it also _not_ safe ? Also, I guess `ioctl` calls to a device that uses `printk` are also unsafe for signal handlers ? – user655617 Jan 10 '13 at 00:26