18

When a process crashes I want the possibility to invoke gdb (or a similar debugger) against it in that crashed-but-not-cleaned-up state. Often post-morteming a core dump gives enough information but sometimes I want to explore the running state further, possibly suppressing the immediate fault and running a little further. It isn't always appropriate to run the process under gdb from the outset (e.g. where the invocation is complex or the bug is absurdly timing-sensitive)

What I'm describing is basically the just-in-time debugging facility that is exposed on MS Windows through the "AEDebug" registry key: leaving the faulting thread suspended while doing something diagnostic. On non-developer Windows PCs this is commonly set to a crash diagnostic mechanism (formerly "Dr Watson"), for which the Ubuntu equivalent seems to be "apport".

I did find an old mail thread (2007) which refers to this question "popping up every now and then", so possibly it exists but described in a way that eludes my searches?

Tom Goodfellow
  • 882
  • 8
  • 18
  • For your applications, you could always add a signal handler on SIGSEGV. – Stephane Chazelas Mar 18 '14 at 14:18
  • @StephaneChazelas - true, but unfortunately I've inherited a zoo of test executables slaved through a somewhat inscrutable perl script (it likes to create symbolic links, sometimes recursively :-) ). And in general I think it will be a useful tool to add to my toolbox. – Tom Goodfellow Mar 18 '14 at 15:03
  • You may want to have a look at what `valgrind` does as I believe it can invoke gdb on some events. – Stephane Chazelas Mar 18 '14 at 16:35
  • @StephaneChazelas Thanks for the valgrind steer; I think the association is that provides an [internal gdb server to make it convenient to inspect the traps it raises](http://tromey.com/blog/?p=731). So that's not the specific magic bullet I was seeking, but actually a better bullet to have learned about. – Tom Goodfellow Mar 19 '14 at 10:12

3 Answers3

23

I don't know if such a feature exist, but as a hack, you could LD_PRELOAD something that adds a handler on SIGSEGV that calls gdb:

cat >> handler.c << 'EOF'
#include <stdlib.h>
#include <signal.h>
void gdb(int sig) {
  system("exec xterm -e gdb -p \"$PPID\"");
  abort();
}

void _init() {
  signal(SIGSEGV, gdb);
}
EOF
gcc -g -fpic -shared -o handler.so -nostartfiles handler.c

And then run your applications with:

LD_PRELOAD=/path/to/handler.so your-application

Then, upon a SEGV, it will run gdb in a xterm. If you do a bt there, you'll see something like:

(gdb) bt
#0  0x00007f8c58152cac in __libc_waitpid (pid=8294,
    stat_loc=stat_loc@entry=0x7fffd6170e40, options=options@entry=0)
    at ../sysdeps/unix/sysv/linux/waitpid.c:31
#1  0x00007f8c580df01b in do_system (line=<optimized out>)
    at ../sysdeps/posix/system.c:148
#2  0x00007f8c58445427 in gdb (sig=11) at ld.c:4
#3  <signal handler called>
#4  strlen () at ../sysdeps/x86_64/strlen.S:106
#5  0x00007f8c5810761c in _IO_puts (str=0x0) at ioputs.c:36
#6  0x000000000040051f in main (argc=1, argv=0x7fffd6171598) at a.c:2

Instead of running gdb, you could also suspend yourself (kill(getpid(), SIGSTOP) or call pause() to start gdb yourself at your leisure.

That approach won't work if the application install a SEGV handler itself or is setuid/setgid...

That's the approach used by @yugr for his libdebugme tool, which you could use here as:

DEBUGME_OPTIONS='xterm:handle_signals=1' \
  LD_PRELOAD=/path/to/libdebugme.so your-application
Stephane Chazelas
  • 5,859
  • 2
  • 34
  • 31
  • That's certainly pleasantly close, since (I think!) it should inherit down whatever child process tree is involved. A supplementary question: it seems that system() isn't assuredly safe in a signal handler (http://man7.org/linux/man-pages/man7/signal.7.html) - is it generally safe anyway and widely-used idiom, or for paranoid safety should it signal a waiting thread. – Tom Goodfellow Mar 18 '14 at 15:51
  • And thanks also for the SIGSTOP/pause suggestion - leaving failures peacefully sleeping might be a good default for soak testing (leave it hammering overnight, post-mortem the small set of repros in the morning) – Tom Goodfellow Mar 18 '14 at 16:01
  • @TomGoodfellow You could do the fork/exec directly instead—both of those are on the list. I'd suggest fork/exec to start gdb in the child, and have the parent SIGSTOP itself. Even that list can't be guaranteed safe after catching a SIGSEGV, by the way—well, depending on what generated the segfault. (E.g., the program could have overwritten random bytes of memory, and hit jump tables set up by the dynamic linker.) – derobert Mar 18 '14 at 16:24
  • @TomGoodfellow I don't know if it's a widely-used idiom, I just came up with it. I've never really needed anything like that as core dumps have always been enough in that kind of situations for me. – Stephane Chazelas Mar 18 '14 at 16:35
  • @derobert - thanks (for both points). A dreadful enough bug will certainly hamper any debugging (eg a too-big memset of a local variable followed by return) but luckily my case is probably just logic flaws tripping assert()s – Tom Goodfellow Mar 19 '14 at 09:37
  • Sorry; very belated accept of this correct answer (once it was working the useful results dropped on top of my mental to-do stack :-) ) – Tom Goodfellow Aug 26 '14 at 07:00
  • 1
    This approach serves as base for [libdebugme](https://github.com/yugr/libdebugme) tool. – yugr Dec 14 '17 at 09:33
  • @yugr, thanks. I've added a link. See also the pull request I just sent and [the OP's PR_SET_PTRACER_ANY](/a/25499817) to avoid the yama warning. – Stephane Chazelas Dec 14 '17 at 11:11
  • @StephaneChazelas Thanks, both suggestions make sense. I've merged the PR and I'll take a look at prctl later in the evening. – yugr Dec 14 '17 at 11:38
  • @yugr, you may also want to execlp(xterm) instead of hardcoding its path. Maybe also allowing `DEBUGME_OPTIONS=xterm=konsole` for the user to specify their prefered X11 terminal emulator, though note that not all accept passing shell code, some even do shell-like command-line parsing. – Stephane Chazelas Dec 14 '17 at 12:07
  • @StephaneChazelas Problem with `execlp` is that it's not signal-safe (I could locate executable at startup though, similarly to what's done e.g. [here](https://github.com/yugr/valgrind-preload/blob/610d9c491155926812d74925aa7d6315b16e66b8/src/pregrind.c#L358)). I didn't implement arbitrary terminals because they all have different command line interfaces (I should probly allow user to specify format of terminal cmdline in `DEBUGME_OPTIONS` but it would be quite some work). I added the `PR_SET_PTRACER_ANY` trick, works like a charm on Xenial. – yugr Dec 15 '17 at 09:49
  • 1
    @yugr, you may still want to add a check for `kernel.yama.ptrace_scope` greater than one. – Stephane Chazelas Dec 15 '17 at 10:04
9

Answering my own question to include the fleshed-out code I derived from the true answer (@Stephane Chazelas above). Only real changes to the original answer are:

  1. setting PR_SET_PTRACER_ANY to allow gdb to attach
  2. a little more (futile?) trying to avoid libc code in the hopes of still working for (some) heap corruptions
  3. included SIGABRT because some of the crashes are assert()s

I've been using it with Linux Mint 16 (kernel 3.11.0-12-generic)

/* LD_PRELOAD library which launches gdb "just-in-time" in response to a process SIGSEGV-ing
 * Compile with:
 *
 * gcc -g -fpic -shared -nostartfiles -o jitdbg.so jitdbg.c
 * 
 * then put in LD_PRELOAD before running process, e.g.:
 * 
 * LD_PRELOAD=~/scripts/jitdbg.so defective_executable
 */

#include <unistd.h>
#include <signal.h>
#include <sys/prctl.h>


void gdb(int sig) {
  if(sig == SIGSEGV || sig == SIGABRT)
    {
      pid_t cpid = fork();
      if(cpid == -1)
        return;   // fork failed, we can't help, hope core dumps are enabled...
      else if(cpid != 0)
        {
          // Parent
          prctl(PR_SET_PTRACER, PR_SET_PTRACER_ANY, 0, 0, 0);  // allow any process to ptrace us
          raise(SIGSTOP);  // wait for child's gdb invocation to pick us up
        }
      else
        {
          // Child - now try to exec gdb in our place attached to the parent

          // Avoiding using libc since that may already have been stomped, so building the
          // gdb args the hard way ("gdb dummy PID"), first copy
          char cmd[100];
          const char* stem = "gdb _dummy_process_name_                   ";  // 18 trailing spaces to allow for a 64 bit proc id
          const char*s = stem;
          char* d = cmd; 
          while(*s)
            {
            *d++ = *s++;
            }
          *d-- = '\0';
          char* hexppid = d;

          // now backfill the trailing space with the hex parent PID - not
          // using decimal for fear of libc maths helper functions being dragged in
          pid_t ppid = getppid();
          while(ppid)
            {
              *hexppid = ((ppid & 0xF) + '0');
              if(*hexppid > '9')
                *hexppid += 'a' - '0' - 10;
              --hexppid;
              ppid >>= 4;
            }
          *hexppid-- = 'x';   // prefix with 0x
          *hexppid = '0';
          // system() isn't listed as safe under async signals, nor is execlp, 
          // or getenv. So ideally we'd already have cached the gdb location, or we
          // hardcode the gdb path, or we accept the risk of re-entrancy/library woes
          // around the environment fetch...
          execlp("mate-terminal", "mate-terminal", "-e", cmd, (char*) NULL);
        }
    }
}

void _init() {
  signal(SIGSEGV, gdb);
  signal(SIGABRT, gdb);
}
Tom Goodfellow
  • 882
  • 8
  • 18
3

If you are able to anticipate that a particular program will crash, you could start it under gdb.

gdb /usr/local/bin/foo
> run

If the program crashes, gdb will catch it and let you continue to investigate.

If you are not able to predict when and which program will crash, then you could enable core dumps system wide.

ulimit -c unlimited

Force a core dump of the foo process

/usr/local/sbin/foo
kill -11 `pidof foo` #kill -3 likely will also work

A core file should be generated which you can attach gdb to

gdb attach `which foo` -c some.core

RedHat systems sometimes require additional configuration besides the ulimit to enable core dumps.

http://www.akadia.com/services/ora_enable_core.html

spuder
  • 17,437
  • 19
  • 87
  • 153