-1

i got a sigkill signal in one of c++ open source program. but gdb does tell who has sent that and why. can any one let me know how to proceed in this case in gdb? or it would be bare source code analysis needed to get the root cause

/home/ubuntu#gdb -p  `pidof e2`
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 2545141
[New LWP 2545201]
[New LWP 2545202]
[New LWP 2545203]
[New LWP 2545204]
[New LWP 2545205]
[New LWP 2545209]
[New LWP 2545210]
[New LWP 2545211]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

warning: Target and debugger are in different PID namespaces; thread lists and other data are likely unreliable.  Connect to gdbserver inside the container.
__pthread_clockjoin_ex (threadid=139679156389632, thread_return=0x0, clockid=<optimized out>, abstime=<optimized out>, block=<optimized out>) at pthread_join_common.c:145
145 pthread_join_common.c: No such file or directory.
(gdb) n
[Thread 0x7f09957fa700 (LWP 2545211) exited]
[Thread 0x7f0995ffb700 (LWP 2545210) exited]
[Thread 0x7f09967fc700 (LWP 2545209) exited]
[Thread 0x7f09977fe700 (LWP 2545204) exited]
[Thread 0x7f099d5bc700 (LWP 2545201) exited]
[Thread 0x7f099d5d4740 (LWP 2545141) exited]
[Thread 0x7f0996ffd700 (LWP 2545205) exited]
[Thread 0x7f099cdbb700 (LWP 2545202) exited]

Program terminated with signal SIGKILL, Killed.
The program no longer exists.
(gdb) info signals
Signal        Stop  Print   Pass to program Description

SIGHUP        Yes   Yes Yes     Hangup
SIGINT        Yes   Yes No      Interrupt
SIGKILL       Yes   Yes Yes     Killed
  • if you have access to source code compile with -g and then breakpoint at the line where sigkill happens and then validate the fields – nvn Apr 16 '23 at 06:38
  • @273k bt gives me nothing at it is out of context now. i know sigkill is sent by os sometime. is it possible to know why os has sent that ...any mem issue ? – myquest6 sh Apr 16 '23 at 06:45
  • @nvm it occurred only once – myquest6 sh Apr 16 '23 at 06:46
  • `...any mem issue` This is rather the program issue. Don't step `n`, run `bt` instead of it. – 273K Apr 16 '23 at 06:49
  • @273 how can u guess this is a program issue rather than os or mem issue? any symtomps ? i had ran "n" before crash as witout that my server will not send resp to client and no processing happen hence no crash – myquest6 sh Apr 16 '23 at 06:53
  • It's the program issue. Enable crash dumps. – 273K Apr 16 '23 at 07:09
  • @273 unfortunately it is running in Kubernetes pod under docker and docker is not allowing to enable coredump. hence that is not an option for me – myquest6 sh Apr 16 '23 at 07:25
  • Usually running out of memory is the cause of unexpected sigkill – Alan Birtles Apr 16 '23 at 07:44
  • https://stackoverflow.com/questions/26285133/who-sends-a-sigkill-to-my-process-mysteriously-on-ubuntu-server – Alan Birtles Apr 16 '23 at 07:45
  • @AlanBirtles does it mean my program has potential memory leak? how can i see this in gdb. and how can i now proceed further as this is a one time crash. non reproducible – myquest6 sh Apr 16 '23 at 08:05
  • It might do. Follow the link i posted to see how to find the source of the signal. Gdb won't help you here as it'll only help you debug your handling of the signal, not where the signal is coming from – Alan Birtles Apr 16 '23 at 08:06
  • thank @AlanBirtles it really a long post. would you able to point me out where in the thread should i look into for the same? – myquest6 sh Apr 16 '23 at 15:17
  • The most upvoted answer is about 10 lines long, doesn't seem to much to read? – Alan Birtles Apr 16 '23 at 16:22

1 Answers1

1

but gdb does tell who has sent that and why.

That's because by the time GDB wakes up to the fact that something happened to the inferior (being debugged) process, that process is already gone.

SIGKILL is most often sent by the kernel due to OOM (out of memory) condition. Look in /var/log/messages (or equivalent for your distribution) -- it likely has some message mentioning oom.

You may not have enough memory to run this program -- you may either need a larger system, or change the parameters of this program so it doesn't need as much memory.

Employed Russian
  • 199,314
  • 34
  • 295
  • 362