4

The question is about printing a meaningful stacktrace programmatically in optimized binary. e.g. We can use backtrace, backtrace_symbols, abi::__cxa_demangle to print a stack trace. But as far as I know we need to build the binaries with compiler flags -g, and not above -O1 optimization flags. I can achieve this.

I am looking forward to generate a backtrace with proper function names in a release binary e.g. compiled with -O3 flag.

Is it viable? I did quite a lot of research on this, but couldn't get anything substantial.

Update 1: Is there a way that we can have a secondary file containing some symbols and that can be referred to generate stack trace from within the optimized binary process?

user3840170
  • 26,597
  • 4
  • 30
  • 62
Yogesh
  • 565
  • 3
  • 21
  • 2
    Just a remake. I think with `-O3` the compile can omit the frame pointer, making harder to get the stack trace. *gcc* has a flag for that. E.g, `-O3 -fno-omit-frame-pointer` – BiagioF Aug 01 '19 at 10:45
  • @BiagioFesta Thanks let me try that, will it have any impact on performance? – Yogesh Aug 01 '19 at 10:46
  • Well, it's not totaly free, of course. Technically, omitting the frame pointer gives the opportunity to the compiler to save one register. *"The -fomit-frame-pointer option instructs the compiler to not store stack frame pointers if the function does not need it. You can use this option to reduce the code image size."* – BiagioF Aug 01 '19 at 10:48
  • The purpose of -O3 is to agressively optimize, and one of this optimization is to suppress as much names and variables as possible. So I doubt you can get all proper function names in that case. – gaFF Aug 01 '19 at 10:51
  • @BiagioFesta even with "-O3 -fno-omit-frame-pointer" I am not getting the complete stack trace, i have a stack like main->fun1->fun2->..->fun5 fun5 will raise signal and I will handle in signalHandler. The fun1->fun2->..->fun5 part of the stack is omitted because of compiler optimization may be. – Yogesh Aug 01 '19 at 10:52
  • 1
    Aggressive optimizations could do function inlining, which makes part of the call stack seem to be missing. – Some programmer dude Aug 01 '19 at 10:53
  • @Someprogrammerdude exactly. – Yogesh Aug 01 '19 at 10:53
  • I will update the question, is there a way we can have a secondary file containing some symbols and that can be referred to generate stack trace from within the process? (on Linux) – Yogesh Aug 01 '19 at 10:54
  • *is there a way we can have a secondary file containing some symbols and that can be referred to generate stack trace from within the process? (on Linux)* I don't see how that can be necessary. The only possible source for what would presumably be address-to-symbol-name mappings would be the executable itself, which obviously is available to the process. There might be performance reasons to have such a precomputed file, however [address space layout randomization](https://en.wikipedia.org/wiki/Address_space_layout_randomization) will likely make your precomputed values meaningless. – Andrew Henle Aug 01 '19 at 11:16
  • 1
    The goals you want to achieve are incompatible. Imagine basically *any* template code like the core of the [ header](https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/stl_algo.h). So many functions are there just to call another function. In the end the operation of a statement can be very simple, but may well expand to a 10-item deep back trace. That's what `-O3` prevents by aggresive inlining. Use `-O2` or `-Og` if you want to retain every link of the call chain. – The Vee Aug 01 '19 at 11:24
  • @TheVee agreed. retaining every link in the call chain is not possible, but lets say if I want to print the functions are are there (not inlined) in the stack, that also seems to be not possible as process is not containing the actual symbol names may be. – Yogesh Aug 01 '19 at 11:31
  • @AndrewHenle yeah, its not necessary. I am just thinking out loud, it seems to me like the goal not possible to achieve in traditional ways. – Yogesh Aug 01 '19 at 11:32

1 Answers1

7

Printing backtrace within signal handler

Regardless of optimisation level, it is not safe to call backtrace1, backtrace_symbols1, nor abi::__cxa_demangle in a signal handler. They are not async-safe functions, and may cause the program to crash, corrupt memory or freeze if used within a signal handler. Regarding printing, in case you were planning to use any printf family of functions, know that they are also not safe to use in a signal handler (at least all of the ones specified by POSIX).

There are libraries / functions that promise signal-safe stack unwinding, as well as demangling, formatting and output which make this possible.

1 According to man pages, using backtrace should be OK as long as the shared libgcc has been loaded beforehand. backtrace_symbols has a safer alternative backtrace_symbols_fd, which has the same caveat with libgcc.


Is there a way that we can have a secondary file containing some symbols

You can copy the debug symbols from the executable using objcopy and remove from the executable using strip.

GDB supports external symbol files, but I don't know if / how they can be used from within the program. I've used SymtabAPI to dig symbols out of binaries; that might work with external symbol files as well. But that library does not promise signal safety as far as I know. That said, it's unclear why the separation would be needed; The debug symbols don't affect performance.


I am going to print the stack only if the process crashes

In this case, a possibly better approach might be to simply let the operating system generate a core dump, and have a separate process listening for file system events, and once a core dump is created, generate a back trace and write to some log. No worries about signal safety, no need to delay the original process from restarting while generating the trace, and no extra dependencies to the server process.


As far as the optimisation level goes, regardless of what method you use to generate the trace, you could try -O3 -fno-omit-frame-pointer and hope for the best, but it's usually best not to use higher than -O2 for debugging. -Og is ideal, but not as fast.

eerorika
  • 232,697
  • 12
  • 197
  • 326
  • Thanks for taking time to answer, I am aware of this. The reuirements are like: I am going to print the stack only if the process crashes. In that case I can bite the bullet, as soon as I receive the signal I unregister the signal handler and proceed to print stack trace, and some other diagnostic details. We can use write system call, but backtrace is something that I will have to use. So, this is not 100% correct to do, but can do wonders without any harm. Also the requirement of being "fast" is inexorable. Its a very very low latency server. – Yogesh Aug 01 '19 at 11:41
  • @Yogesh Note that you can't really safely demangle symbols in a signal handler. `__cxa_demangle()` uses `malloc()`-family calls, and `SIGSEGV` from within `malloc()`/`free()` *et al* is all too common should the heap become corrupted. You'll likely deadlock the process if you try to demangle in a signal handler. See https://stackoverflow.com/questions/23680297/which-signal-was-delivered-to-process-deadlocked-in-signal-handler?noredirect=1&lq=1 In my experience, `backtrace()` isn't as dangerous. – Andrew Henle Aug 01 '19 at 11:48
  • @AndrewHenle thanks andrew, I am aware of the fact. Actually I went through the same post that you posted in the comments. So even if I use the libraries suggested by eerorika, I don't want to compromise with O3 flag. That's my concern. – Yogesh Aug 01 '19 at 11:54
  • @eerorika thanks for being thoughtful, "and once a core dump is created, generate a back trace and write to some log" - but this core will not have the necessary information to debug, as the binary is an O3 flag compiled. Its a nice alternative though, but actually I was not only looking to dump stack trace, but collecting too many other information in the process - like connectivity, database related, information in registers etc. The concern here is O3 compiled binary will not bear enough information in core dump. – Yogesh Aug 01 '19 at 12:02
  • 1
    @Yogesh `but this core will not have the necessary information to debug, as the binary is an O3 flag compiled` That problem exists regardless of how you generate the back trace, and can only be solved by not using O3. Generating the trace within the program doesn't solve the problem, but introduces new problems with signal safety instead. – eerorika Aug 01 '19 at 12:03
  • @eerorika Exactly. Cannot agree more, I have been thinking on this problem for quite some time ad its clumsy for engineers to revert binaries on production environment and then see the issue. That's why i posted to this elite group - to see the viability and think together. Its an interesting problem though. I will keep working to achieve it. But any advice is more than welcome. An upvote for your efforts. – Yogesh Aug 01 '19 at 12:07
  • Actually I'm afraid it's the only way... I think the generated core file is what you should use, along with split debug symbols, if you're worried about leaking information about the code you could not distribute the debug files, keep a copy of each build you have delivered and make sure the core dump will have a "version" identifier somewhere so you can figure out which version the core was generated from. Also you can make linux create per-pid core files so you could have multiple dumps from the same machine for different problems. – xception Aug 01 '19 at 12:55
  • @xception thanks for paying heed, what do you mean by "along with split debug symbols"?. The concern is coredump generated with an -O3 optimized binary will not have enough information to print stack trace. – Yogesh Aug 02 '19 at 06:04
  • @Yogesh yes, it may or may not have, but the same is true for -O2, optimization might inline some function calls and optimize away some stuff, so the stack trace might not be complete, but it will still show you most of what happened. For the split debug symbols look here https://stackoverflow.com/questions/866721/how-to-generate-gcc-debug-symbol-outside-the-build-target ... gentoo can do this for the whole system if told to do this (that's how I learned it exists). – xception Aug 02 '19 at 08:13