19

I'm using 'backtrce()' and 'backtrace_symbols_fd()' functions in a signal handler to generate a backtrace for debugging (GDB not available).

They work fine on x86 desktop (Ubuntu), but on the target device (ARM based) the backtrace on Abort signal (due to double-free error) shows only three frames: the signal handler and two from within libc, which is not useful for debugging our code! Backtrace on SEGV (e.g. using a bad pointer) DOES produce a good backtrace.

Why can't I get a useful backtrace on ABRT signal on ARM?

[Question edited for clarity]

Here's a simple test program which demonstrates the problem:

#include <execinfo.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

// Signal hangler to catch seg fault:
void handler_segv(int sig) {
    // get void*'s for all entries on the stack
    void *array[10];
    size_t size;
    size = backtrace(array, 10);
    fprintf(stderr, "Error: Signal %d; %d frames found:\n", sig, size);
    // print out all the frames to stderr
    backtrace_symbols_fd(array, size, STDERR_FILENO);
    exit(1);
}


void crashme()
{
  // Deliberate Error: Abort (double free):
  char *test_ptr = malloc(1);
  free(test_ptr);
  free(test_ptr);
  // Deliberate Error #2: Seg fault:
  //char * p = NULL;
  //*p = 0;
}

void foo()
{
    fprintf(stdout, "---->About to crash...\n");
    crashme();
    fprintf(stdout, "---->Crashed (shouldn't get to here)...\n");
}



// Main entry point:
int main(int argc, char *argv[])
{
    fprintf(stdout, "Application start...\n");

    // Install signal handlers:
    fprintf(stdout, "-->Adding handler for SIGSEGV and SIGABRT\n");
    signal(SIGSEGV, handler_segv);
    signal(SIGABRT, handler_segv);

    fprintf(stdout, "-->OK. Causing Error...\n");
    foo();
    fprintf(stdout, "-->Test finished (shouldn't get to here!)\n");
    return 0;
}

This was compiled for x86 as follows:

gcc -o test test-backtrace-simple.c -g -rdynamic

And for ARM:

arm-none-linux-gnueabi-gcc -o test-arm test-backtrace-simple.c -g -rdynamic -O0 -mapcs-frame -funwind-tables -fasynchronous-unwind-tables

I've used various compiler options for ARM as described in other posts related to generating backtraces on ARM.

When run on the x86 desktop, it generates the expected output with plenty of debug, ending in:

Error: Signal 6; 10 frames found: 
./test(handler_segv+0x19)[0x80487dd]
[0xb7745404] 
[0xb7745428]
/lib/i386-linux-gnu/libc.so.6(gsignal+0x4f)[0xb75b0e0f]
/lib/i386-linux-gnu/libc.so.6(abort+0x175)[0xb75b4455]
/lib/i386-linux-gnu/libc.so.6(+0x6a43a)[0xb75ed43a]
/lib/i386-linux-gnu/libc.so.6(+0x74f82)[0xb75f7f82]
./test(crashme+0x2b)[0x8048855] 
./test(foo+0x33)[0x804888a]
./test(main+0xae)[0x8048962]

(i.e. the back trace generated by my handler, with my function calls at the bottom).

However, when run on the ARM platform, I get:

Application start...
-->Adding handler for SIGSEGV and SIGABRT
-->OK. Causing Error...
---->About to crash...
*** Error in `/opt/bin/test-arm': double free or corruption (fasttop): 0x015b6008 ***
Error: Signal 6; 3 frames found:
/opt/bin/test-arm(handler_segv+0x24)[0x8868]
/lib/libc.so.6(__default_sa_restorer_v2+0x0)[0xb6e6c150]
/lib/libc.so.6(gsignal+0x34)[0xb6e6af48]

The backtrace() finds only 3 frames, and they are only the signal handler and something in libc (not useful)!

I found a mailing list post which said:

If you link with the debugging C library, -lc_g, you'll get debugging info back past abort().

This might be relevant, but -lc_g doesn't work on my compiler (ld: cannot find -lg_c).

The backtrace works fine on ARM if I generate a seg fault instead (e.g. change crashme() function to use "char *p = NULL; *p = 0;" instead of the double free.

Any ideas or suggestions for other ways to get a back trace?

[--EDIT--]

I tried some MALLOC_CHECK_ options as suggested in the comments, but the only effect was to change whether the abort was generated. Here is the output from three runs on the ARM:

 # MALLOC_CHECK_=0 /opt/bin/test-arm
Application start...
-->Adding handler for SIGSEGV and SIGABRT
-->OK. Causing Error...
---->About to crash...
---->Crashed (shouldn't get to here)...
-->Test finished (shouldn't get to here!)


# MALLOC_CHECK_=1 /opt/bin/test-arm
Application start...
-->Adding handler for SIGSEGV and SIGABRT
-->OK. Causing Error...
---->About to crash...
*** Error in `/opt/bin/test-arm': free(): invalid pointer: 0x015b2008 ***
---->Crashed (shouldn't get to here)...
-->Test finished (shouldn't get to here!)


# MALLOC_CHECK_=2 /opt/bin/test-arm
Application start...
-->Adding handler for SIGSEGV and SIGABRT
-->OK. Causing Error...
---->About to crash...
Error: Signal 6; 3 frames found:
/opt/bin/test-arm(handler_segv+0x24)[0x8868]
/lib/libc.so.6(__default_sa_restorer_v2+0x0)[0xb6e24150]
/lib/libc.so.6(gsignal+0x34)[0xb6e22f48]
#

MALLOC_CHECK_=0: No error message (double free is ignored!)

MALLOC_CHECK_=1: Error message, but program continues

MALLOC_CHECK_=2: Error message and ABRT signal; useless backtrace generated (this is the default behaviour!)

My cross compiler reports: gcc version 4.6.1 (Sourcery CodeBench Lite 2011.09-70) Target device has linux kernel version 3.8.8

Jeremy
  • 1,083
  • 3
  • 13
  • 25
  • Have you taken a look at: http://www.gnu.org/software/libc/manual/html_node/Backtraces.html It gives an example of how backtracing can be used without the need for gdb. Let me know if this helps, thanks. – KillaBytes Jul 21 '15 at 01:44
  • @Kozmik yes, already using pretty much that (see question and attached example code). However it doesn't work correctly for an ABRT caused by double free. – Jeremy Jul 21 '15 at 01:55
  • 1
    Could you state what your asking for as briefly as possible? I'm a bit confused on your question about what it is you are really asking for help on. – KillaBytes Jul 21 '15 at 02:27
  • "How (on ARM platform) do I get a useful back trace for an abort signal caused by a double free?". Using the 'backtrace()' function I only get three frames, one from the signal handler and two from libc, which are not useful since I am trying to find out where (in my code) the double free is occurring. Note the back trace DOES work properly when the code is run on my Ubuntu desktop so it seems to be an issue with the ARM compiler. – Jeremy Jul 21 '15 at 03:51
  • You can try using MALLOC_CHECK_ to '1' or '2' and see if this helps. You should see early aborts or error messages which will help you debug. This along with backtrace should help you out. http://www.gnu.org/software/libc/manual/html_node/Heap-Consistency-Checking.html – Arun Valiaparambil Jul 22 '15 at 05:56
  • @Arun: I tried MALLOC_CHECK_ as suggested, but this only changed whether any error message and/or abort signal was generated by the double-free; if the abort occurs, I still get a useless backtrace that shows only libc (not my code) - See edited question above for the output. – Jeremy Jul 22 '15 at 23:01
  • `fprintf(stderr, "Error: Signal ...` : you know that `prinf()` and friends are not signal-safe ? – wildplasser Jan 17 '18 at 11:53
  • I can relate that the same thing occurs on my Raspberry Pi. Compiling with the additional flags doesn't change anything. – Jacajack Mar 25 '18 at 20:09
  • Running on the exact same problem. Did you ever got it working? Did you have a chance to try @itaych 's suggestion? – Pau Guillamon Apr 18 '18 at 09:48
  • I haven't tried @itaych 's solution, but I think it is probably correct, i.e. you also have to build libstdc++ and similar libraries with the appropriate flags set, and they probably weren't set in the toolchain we were using. – Jeremy Apr 19 '18 at 20:34

2 Answers2

15

It appears you have done sufficient research to know that you need the switches -funwind-tables and -fasynchronous-unwind-tables in your compiler command line. In practice either one of them seems sufficient but clearly without them backtracing doesn't work at all. Now, the trouble with things like SIGABRT is that the backtrace must traverse stack frames that were generated by libc functions such as abort and gsignal, and fails because that lib is not built with either of those switches (in any distribution that I know of).

While it would be nice to petition the maintainers of Sourcery CodeBench to build their distribution with that option, the only immediate solution is to build libc yourself, with either or both of those flags set (in my experience just -funwind-tables is enough). If you also need a stack trace in case of catching an unhandled exception (via std::set_terminate) then you will also need to rebuild libstdc++.

At my workplace we needed backtraces for both cases (SIGABRT and unhandled exceptions), and since libstdc++ is part of the toolchain we rebuilt the toolchain ourselves. The tool crosstool-NG makes this relatively easy to do. In the configuration utility ./ct-ng menuconfig we entered section Target Options and edited Target CFLAGS (which sets the build variable TARGET_CFLAGS) to -funwind-tables. The resulting toolchain (more specifically, using the libc and libstdc++ from the resulting toolchain build) provides us with a full backtrace in nearly all cases.

I've found one case where we still don't get a full backtrace: if the crash occurred within a function that originally is written in assembly, such as memcpy (unfortunately this is not an uncommon occurrence). Perhaps some option needs to be passed to the assembler, but I didn't have the time to investigate this further.

itaych
  • 644
  • 5
  • 18
  • Thanks, that's an interesting angle that I had not considered. Unfortunately I'm not in a position to test your solution at the moment. – Jeremy Jan 17 '18 at 20:00
  • So I managed to compile glibc 2.28 with -funwind-tables and -fasynchronous-unwind-tables and test the code above (on ubuntu linaro 16.04 armhf). I had to use the following article to be sure I was linking against my custom-built glibc. https://stackoverflow.com/questions/10763394/how-to-build-a-c-program-using-a-custom-version-of-glibc-and-static-linking All of this to no avail: no stack trace with any MALLOC_DEBUG_ level [0 .. 2];. Any thoughts on what I might have missed, or does stack tracing just not work on ARM? – Dave McMordie Dec 01 '18 at 23:27
  • @DaveMcMordie - running the code from the question, on an ARM, the stack trace has 7 entries and ends with "/lib/libc.so.6(__libc_start_main+0x114)[0xf70fecfc]" and looks complete, and the same as what I get on x86. Make sure you're linking with your custom glibc at runtime - if they're not placed in the standard location on your target system (/lib/ I guess) use LD_LIBRARY_PATH. Also make sure that your own project is also compiled with the flags -funwind-tables -fasynchronous-unwind-tables -g -rdynamic . – itaych Dec 06 '18 at 09:27
  • @itaych what version of glibc have you linked against and how was it built? I am certain I am linking against the correct glibc. The default one actually does better-- I get five frames ending at abort. My configure for glibc: ../configure --prefix=/opt/lib CFLAGS='-mapcs-frame -rdynamic -funwind-tables -fasynchronous-unwind-tables -fno-omit-frame-pointer -g -O3' libc_cv_ctors_header=yes – Dave McMordie Dec 10 '18 at 16:40
  • @DaveMcMordie The Glibc version is 2.26. We built it along with the GCC 5.1 toolchain using ct-ng as detailed in my answer above. – itaych Dec 16 '18 at 14:39
  • 1
    @itaych I confirm your answer and I have to retract my certainty that I was linking correctly. I am now able to get 13 frames. Turns out linking correctly against a custom glibc is rather tricky and will only work correctly on programs with no other dependencies (not our case). I have actually reverted to rebuilding the debian package (ie. apt-get source libc6-dev) with the modified cflags in the debian/rules file. Thanks very much for taking the time to report your findings! – Dave McMordie Jan 10 '19 at 21:55
  • Very useful comment. Note that it must be possible in theory without rebuilding the toolchain though. I have the same issue as Jeremy explains (perfectly working backtrace decoding on x86, not on arm) But if I debug my arm application with gdbserver, it does succeed in fully decoding all the task frames. I guess gdb is more clever than the "backtrace_symbols" function that we compile with. – Arnout Aug 14 '19 at 09:06
5

This is because unwinding through signal handlers is broken in glibc on ARM. I've dug into this a few years back and managed to create a working standalone fix. The hard part was digging through the undocumented bowels of exception handling in glibc, after that the fix was simple bordering on trivial.

I posted this to the glibc mailing list, as reply to an old thread about this problem, in the hope that a glibc dev would take my standalone fix as guide to fix it in glibc proper, but this never happened.

Recently I tested it again: it turns out that the problem still hasn't been fixed in glibc, and due to changes in glibc my fix no longer works. Update: I've fixed it!

Matthijs
  • 704
  • 7
  • 8
  • 1
    I'm glad it wasn't just me going crazy! Thanks for posting your solution; next time I'm working on the ARM platform, I'll need to try it. Hopefully someone finds it useful! – Jeremy Feb 13 '20 at 20:39
  • @Matthijs I tried your library but unfortunately it didn't work. In the SIGABRT case it just prints 3 frames consisting of handler_segv, then `sa_restorer_v2.S` from your library, then some nonsense address from libc (`addr2line` maps it to `strfmon_l.c` which makes no sense). If I trigger the abort via gdb, that **does** print all frames, so there _is_ some way to get that info... – dqbydt Mar 04 '21 at 00:34
  • Backtrace from SIGSEGV does match that shown by gdb. – dqbydt Mar 04 '21 at 00:37
  • addr2line wants an address relative to the start of the executable, but the location of the executable in ram is randomized for security (ASLR) so feeding actual runtime addresses directly into addr2line is not going to work. – Matthijs Mar 04 '21 at 03:45
  • It's not really clear what you're doing, and this comment thread is probably not the best place for a detailed discussion. Feel free to open an issue on github with a sufficiently detailed explanation of the problem you're having. – Matthijs Mar 04 '21 at 03:52
  • @Matthijs, you wrote "Update: I've fixed it!" Does this mean it is now fixed in mainstream? Can you provide a link to the thread? – eDeviser May 17 '23 at 10:17
  • @eDeviser no I meant I fixed my code to support current glibc. – Matthijs May 18 '23 at 23:21