0

I am trying to troubleshoot a locally compiled daemon program on Linux, the program crashes with signal 11, and I need to trace back the stack and values of variables to fix the bug.

However for some reason the Linux kernel does not save a core dump, but simply logs the following (not giving a away the program name for now):

Apr  8 15:22:54 machinename kernel: [ 5032.337089] traps: program[4121] general protection ip:7ff47cbf9614 sp:7ff45f68abb8 error:0
Apr  8 15:22:54 machinename kernel: [ 5032.337110]  in libc-2.19.so[7ff47cb7d000+1a1000]

I have already tried to ensure the criteria in core(5) are satisfied, but to no avail.

Is there a way to make the Linux kernel report or log why it is not producing a core dump?

Is there any other way to troubleshoot that situation?

Note that this differs from other questions on the topic by: 1. Not being specific to a named program or library and its idiosynchasies. 2. Looking for answers that will be of general use to other developers hit with this somewhat confusing kernel behaviour.

jb_dk
  • 117
  • 6
  • Related: my Q&A: [Ask Ubuntu: Where do I find core dump files, and how do I view and analyze the backtrace (stack trace) in one?](https://askubuntu.com/q/1349047/327339) – Gabriel Staples Apr 08 '22 at 14:20
  • If you end up figuring it out, please answer your own question too please. You can mark it as correct as well. – Gabriel Staples Apr 08 '22 at 14:41
  • Sorry Mr. Staples, but your Q&A only lists things I have done (setting ulimits according to core(5) and confirming them via /proc/pid/limits) and things that cannot be easily done for a daemon (running it under gdb directly). – jb_dk Apr 08 '22 at 15:04
  • I understand. I'm not a core dump expert yet, so I wasn't sure if anything there would help or not for your case. – Gabriel Staples Apr 08 '22 at 15:41

1 Answers1

1

I finally fixed this by building a custom kernel with extra code to log the exact reason for writing or not writing a core dump (this really should be default behavior, but isn't). Turns out that one of the documented criteria (suid_dumpable) was applied despite the daemon not being an suid binary, it just dropped root when daemonizing, which is kind of the opposite from a security perspective.

So I changed suid_dumpable to 1, got a core dump file and copied it to the machine with the source code, then struggled with getting gdb to load the symbols (lots of unclear documentation on that too). But eventually, I got close enough to suspect a cause and implement a workaround in the daemon code.

So lesson for others: /proc/sys/fs/suid_dumpable is applied to binaries that drop root to run under an unprivileged user id, despite being a setting to protect against suid binaries dumping user readable core files with privileged information obtained under a privileged account.

jb_dk
  • 117
  • 6