26

UPDATES on Oct 25:

Now I found out what's causing the problem.

1) The child process kills itself, that's why strace/perf/auditctl cannot track it down.

2) The JNI call to create a process is triggered from a Java thread. When the thread eventually dies, it's also destroying the process that it creates.

3) In my code to fork and execve() a child process, I have the code to monitor parent process death and kill my child process with the following line: prctl( PR_SET_PDEATHSIG, SIGKILL ); My fault that I didn't pay special attention to this flag before b/c it's considered as a BEST PRACTICE for my other projects where child process is forked from the main thread.

4) If I comment out this line, the problem is gone. The original purpose is to kill the child process when the parent process is gone. Even w/o this flag, it's still the correct behavior. Seems like the ubuntu box default behavior.

5) Finally found it's a kernel bug, fixed in kernel version 3.4.0, my ubuntu box from AWS is kernel version 3.13.0-29-generic.

There are a couple of useful links to the issues:

a) http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them

b) prctl(PR_SET_PDEATHSIG, SIGNAL) is called on parent thread exit, not parent process exit.

c) https://bugzilla.kernel.org/show_bug.cgi?id=43300

UPDATES on Oct 15:

Thanks so much for all the suggestions. I am investigating from one area of the system to another area. It's hard 2 find a reason.

I am wondering 2 things. 1) why are powerful tools such as strace, auditctl and perf script not able to track down who caused the kill?

2) Is +++ killed by SIGKILL +++ really means its killed from signal?

ORIGINAL POST

I have a long running C process launched from a Java application server in Ubuntu 12 through the JNI interface. The reason I use JNI interface to start a process instead of through Java's process builder is b/c of the performance reasons. It's very inefficient for java process builder to do IPC especially b/c extra buffering introduces very long delay.

Periodically it is terminated by SIGKILL mysteriously. The way I found out is through strace, which says: "+++ killed by SIGKILL +++"

I checked the following:

  1. It's not a crash.
  2. It's not a OOM. Nothing in dmesg. My process uses only 3.3% of 1Gbytes of memory.
  3. Java layer didn't kill the process. I put a log in the JNI code if the code terminates the process, but no log was written to indicate that.
  4. It's not a permission issue. I tried to run as sudo or a different user, both cases causes the process to be killed.
  5. If I run the process locally in a shell, everything works fine. What's more, in my C code for my long-running process, I ignore the signal SIGHUP. Only when it's running as a child process of Java server, it gets killed.
  6. The process is very CPU intensive. It's using 30% of the CPU. There are lots of voluntary context switch and nonvoluntary_ctxt_switches.
  7. (NEW UPDATE) One IMPORTANT thing very likely related to why my process is killed. If the process do some heavy lifting, it won't be killed, however, sometimes it's doing little CPU intensive work. When that happens, after a while, roughly 1 min, it is killed. It's status is always S(Sleeping) instead of R(Running). It seems that the OS decides to kill the process if it was idle most of the time, and not kill the process if it was busy.
  8. I suspect Java's GC is the culprit, however, Java will NEVER garbage collect a singleton object associated with JNI. (My JNI object is tied to that singleton).

I am puzzled by the reason why it's terminated. Does anyone has a good suggestion how to track it down?

p.s.

  1. On my ubuntu limit -a result is:

    core file size          (blocks, -c) 0
    data seg size           (kbytes, -d) unlimited
    scheduling priority             (-e) 0
    file size               (blocks, -f) unlimited
    pending signals                 (-i) 7862
    max locked memory       (kbytes, -l) 64
    max memory size         (kbytes, -m) unlimited
    open files                      (-n) 65535
    pipe size            (512 bytes, -p) 8
    POSIX message queues     (bytes, -q) 819200
    real-time priority              (-r) 0
    stack size              (kbytes, -s) 8192
    cpu time               (seconds, -t) unlimited
    max user processes              (-u) 7862
    virtual memory          (kbytes, -v) unlimited
    file locks                      (-x) unlimited
    

    I tried to increase the limits, and still does not solve the issue.

    core file size          (blocks, -c) 0
    data seg size           (kbytes, -d) unlimited
    scheduling priority             (-e) 0
    file size               (blocks, -f) unlimited
    pending signals                 (-i) unlimited
    max locked memory       (kbytes, -l) unlimited
    max memory size         (kbytes, -m) unlimited
    open files                      (-n) 65535
    pipe size            (512 bytes, -p) 8
    POSIX message queues     (bytes, -q) unlimited
    real-time priority              (-r) 0
    stack size              (kbytes, -s) 8192
    cpu time               (seconds, -t) unlimited
    max user processes              (-u) unlimited
    virtual memory          (kbytes, -v) unlimited
    file locks                      (-x) unlimited
    
  2. Here is proc status when I run cat /proc/$$$/status

    Name:   mimi_coso
    State:  S (Sleeping)
    Tgid:   2557
    Ngid:   0
    Pid:    2557
    PPid:   2229
    TracerPid:  0
    Uid:    0   0   0   0
    Gid:    0   0   0   0
    FDSize: 256
    Groups: 0
    VmPeak:   146840 kB
    VmSize:   144252 kB
    VmLck:         0 kB
    VmPin:         0 kB
    VmHWM:     36344 kB
    VmRSS:     34792 kB
    VmData:    45728 kB
    VmStk:       136 kB
    VmExe:       116 kB
    VmLib:     23832 kB
    VmPTE:       292 kB
    VmSwap:        0 kB
    Threads:    1
    SigQ:   0/7862
    SigPnd: 0000000000000000
    ShdPnd: 0000000000000000
    SigBlk: 0000000000000004
    SigIgn: 0000000000011001
    SigCgt: 00000001c00064ee
    CapInh: 0000000000000000
    CapPrm: 0000001fffffffff
    CapEff: 0000001fffffffff
    CapBnd: 0000001fffffffff
    Seccomp:    0
    Cpus_allowed:   7fff
    Cpus_allowed_list:  0-14
    Mems_allowed:   00000000,00000001
    Mems_allowed_list:  0
    voluntary_ctxt_switches:    16978
    nonvoluntary_ctxt_switches: 52120
    
  3. strace shows:

    $ strace -p 22254 -s 80 -o /tmp/debug.lighttpd.txt
    read(0, "SGI\0\1\0\0\0\1\0c\0\0\0\t\0\0T\1\2248\0\0\0\0'\1\0\0(\0\0"..., 512) = 113
    read(0, "SGI\0\1\0\0\0\1\0\262\1\0\0\10\0\1\243\1\224L\0\0\0\0/\377\373\222D\231\214"..., 512) = 448
    sendto(3, "<15>Oct 10 18:34:01 MixCoder[271"..., 107, MSG_NOSIGNAL, NULL, 0) = 107
    write(1, "SGO\0\0\0\0 \272\1\0\0\t\0\1\253\1\243\273\0\0\0\0'\1\0\0\0\0\0\1\242"..., 454) = 454
    sendto(3, "<15>Oct 10 18:34:01 MixCoder[271"..., 107, MSG_NOSIGNAL, NULL, 0) = 107
    write(1, "SGO\0\0\0\0 \341\0\0\0\10\0\0\322\1\254Z\0\0\0\0/\377\373R\4\0\17\21!"..., 237) = 237
    read(0, "SGI\0\1\0\0\0\1\0)\3\0\0\t\0\3\32\1\224`\0\0\0\0'\1\0\0\310\0\0"..., 512) = 512
    read(0, "\344u\233\16\257\341\315\254\272\300\351\302\324\263\212\351\225\365\1\241\225\3+\276J\273\37R\234R\362z"..., 512) = 311
    read(0, "SGI\0\1\0\0\0\1\0\262\1\0\0\10\0\1\243\1\224f\0\0\0\0/\377\373\222d[\210"..., 512) = 448
    sendto(3, "<15>Oct 10 18:34:01 MixCoder[271"..., 107, MSG_NOSIGNAL, NULL, 0) = 107
    write(1, "SGO\0\0\0\0 %!\0\0\t\0\0+\1\243\335\0\0\0\0\27\0\0\0\0\1B\300\36"..., 8497) = 8497
    sendto(3, "<15>Oct 10 18:34:01 MixCoder[271"..., 107, MSG_NOSIGNAL, NULL, 0) = 107
    write(1, "SGO\0\0\0\0 \341\0\0\0\10\0\0\322\1\254t\0\0\0\0/\377\373R\4\0\17\301\31"..., 237) = 237
    read(0, "SGI\0\1\0\0\0\1\0\262\1\0\0\10\0\1\243\1\224\200\0\0\0\0/\377\373\222d/\200"..., 512) = 448
    sendto(3, "<15>Oct 10 18:34:01 MixCoder[271"..., 107, MSG_NOSIGNAL, NULL, 0) = 107
    write(1, "SGO\0\0\0\0 \341\0\0\0\10\0\0\322\1\254\216\0\0\0\0/\377\373R\4\0\17\361+"..., 237) = 237
    read(0, "SGI\0\1\0\0\0\1\0\221\0\0\0\t\0\0\202\1\224\210\0\0\0\0'\1\0\0P\0\0"..., 512) = 159
    read(0,  unfinished ...)
    
    +++ killed by SIGKILL +++
    
Community
  • 1
  • 1
Howy
  • 825
  • 2
  • 10
  • 20
  • What's happening to the parent process (the "Java server")? – NPE Oct 09 '14 at 18:14
  • How your java parameters are set up? I mean such options like: -Xms -Xmx and so on. – RaFD Oct 09 '14 at 20:08
  • @NPE, the parent process is running fine. Only the child process is killed. – Howy Oct 09 '14 at 22:05
  • @RaFD, I didn't specify -Xms or -Xmx. My C process is started from JNI interface instead of through Java process builder. The reason why I did that is b/c of the performance inefficiency of java process builder. – Howy Oct 09 '14 at 22:13
  • 1
    "It's very inefficiency for java process builder to do IPC." What sort of IPC are you doing?? – Chris Stratton Oct 09 '14 at 23:05
  • Maybe for testing, start with increase limit values and you will see if the problem still exists. Btw, what is happening in the process before signal (SIGKILL) is delivered (could you paste output of strace e.g. last 20 lines) – RaFD Oct 10 '14 at 10:41
  • @Chris, low latency process pipe stuff. Java ProcessBuilder sucks in that area. I have to write my own JNI module to do that. – Howy Oct 10 '14 at 18:02
  • @RaFD, there is nothing much in strace. The only thing I know it's terminated by sigkill, see my updated original post – Howy Oct 10 '14 at 18:37
  • Since you're already mucking about with JNI, may be this will be helpful: http://stackoverflow.com/questions/8400530/how-can-i-tell-in-linux-which-process-sent-my-process-signal –  Oct 10 '14 at 18:43
  • @Arkadiy, thanks so much. I confirmed SIGKILL can NOT be caught from any code and I am sure from strace it's a SIGKILL signal. Therefore, it's a hard issue to track down. – Howy Oct 10 '14 at 19:58
  • One IMPORTANT thing very likely related to why my process is killed. If the process do some heavy lifting, it won't be killed, however, after sometimes it's doing little CPU intensive work. When that happens, after a while, roughly 1 min, it is killed. It's status is always S(Sleeping) instead of R(Running). It seems that the OS decides to kill the process if it was idle most of the time, and not kill the process if it was busy. – Howy Oct 10 '14 at 22:40
  • @Howy I suggest that you could probably overcome your alleged performance problems by wrapping buffered streams around the process streams. Java really doesn't contribute much overhead in this area. – user207421 Oct 15 '14 at 04:46
  • @EJP, the issue with buffered streams around the process stream is that buffering introduces unnecessary latency and it's performance is poor. I need to do very fast IPC, therefore I built it using JNI. Maybe that's causing the issue why the process gets killed. Thanks – Howy Oct 15 '14 at 17:15
  • @RaFD, I tried to up the ulimit for my system, and that does not seem to help. see my updated original thread. – Howy Oct 15 '14 at 17:37
  • @Howdy, since you are running the process on the same machine and want low latency, you might wish to explore shared memory (via mapped direct buffers) IPC. This will cost 2 busy spin loops for lowest latency unless you wish to go further and implement futex (via JNI) – bestsss Oct 19 '14 at 23:08
  • btw, do you have any cron tasks? – bestsss Oct 19 '14 at 23:32
  • have you tried to attach gdb to the process? – Alex Oct 20 '14 at 15:14
  • What kind of Java server is doing this? A pure Java command-line-based program? Apache/Tomcat? – RealSkeptic Oct 20 '14 at 18:37
  • @Alex, I tried attaching gdb, it exits when KILL is sent and there is no stack. – Howy Oct 20 '14 at 22:18
  • @RealSkeptic It's a java spring framework server, doing network request, passing data to external C processes using Process pipe to do CPU intensive work and back to the java server for consumption. Regular Java process builder introduces too much delay. – Howy Oct 20 '14 at 22:20
  • @bestsss There is no cron tasks, it's a clean machine from aws. – Howy Oct 20 '14 at 22:21
  • @bestsss I didn't try Mapped Direct Buffers IPC. I was implementing my own Process pipe using JNI for faster access. There are no good Java based solution for low latency process pipe for IPC. Maybe Mapped Direct Buffers are equally fast. I can explore, but that won't explain why the process was killed mysteriously. Thanks. – Howy Oct 20 '14 at 22:24
  • Shared mem should be the fastest (lowest latency) IPC there is but it's really hard to use (properly) and in Java world you need some inherent knowledge of NIO... Mentioning AWS - have you managed to reproduce the problem on a local machines (non-vm). – bestsss Oct 21 '14 at 01:30
  • @bestsss thanks. I haven't tried on a local machine. Will give it a shot. – Howy Oct 21 '14 at 17:57

3 Answers3

21

Assuming that you have root access on your machine, you can enable audit on kill(2) syscall to gather such information.

root # auditctl -a exit,always -F arch=b64 -S kill -F a1=9
root # auditctl -l
LIST_RULES: exit,always arch=3221225534 (0xc000003e) a1=9 (0x9) syscall=kill

root # sleep 99999 &
[2] 11688
root # kill -9 11688

root # ausearch -sc kill
time->Tue Oct 14 00:38:44 2014
type=OBJ_PID msg=audit(1413272324.413:441376): opid=11688 oauid=52872 ouid=0 oses=20 ocomm="sleep"
type=SYSCALL msg=audit(1413272324.413:441376): arch=c000003e syscall=62 success=yes exit=0 a0=2da8 a1=9 a2=0 a3=0 items=0 ppid=6107 pid=6108 auid=52872 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsg
id=0 tty=pts2 ses=20 comm="bash" exe="/bin/bash" key=(null)

The other way is to set up kernel tracing which may be an over-kill when audit system can do same work.

Whymarrh
  • 13,139
  • 14
  • 57
  • 108
Gary Luo
  • 266
  • 1
  • 2
  • Thanks, I ran the exact same commands as you mentioned above, and unfortunately "sleep" command is not caught. Here is my result: ubuntu@ip-172-31-26-106:~$ sudo auditctl -l LIST_RULES: exit,always arch=3221225534 (0xc000003e) a1=9 (0x9) syscall=kill LIST_RULES: exit,always auid=0 syscall=all ubuntu@ip-172-31-26-106:~$ sleep 99999 & [1] 10614 ubuntu@ip-172-31-26-106:~$ kill -9 10614 ubuntu@ip-172-31-26-106:~$ [1]+ Killed sleep 99999 ubuntu@ip-172-31-26-106:~$ sudo ausearch -sc kill ---- – Howy Oct 14 '14 at 17:23
  • The command only caught kill command if I ran "sudo kill -9 pidOfSleep". Without sudo it won't catch anything. What's more, I couldn't find why my long running process gets killed in the log. – Howy Oct 14 '14 at 17:44
  • Could it be that you were using 32 bit syscall? 'uname -a' and 'file /bin/bash' should give you some hints. If that's the case you need also audit 32 bit syscall, e.g. '# auditctl -a exit,always -F arch=b32 -S kill -F a1=9'. – Gary Luo Oct 14 '14 at 20:10
  • No, it's 64 bits. It works now, maybe my rule was set wrong initially. sleep kill is caught. However, my process's kill message was not caught. Is it possible, the process killed itself with a KILL message when certain ulimits has reached? – Howy Oct 14 '14 at 23:14
  • Audit should be able to capture all kill syscalls, no matter who sent to whom. FWIW I posted one other method to use kernel trace, which you can also give a try. – Gary Luo Oct 15 '14 at 02:43
  • Based on https://serverfault.com/questions/801021/how-to-check-how-my-process-in-linux-centos-gets-killed I added `-a always,exit -F arch=x86_64 -S kill -F key=catch_kill` to `/etc/audit/audit.rules` and then ran `service auditd restart` to make sure this rule stayed in effect in perpetuity. To search I used `ausearch -k catch_kills` so it seems the key attribute is helpful. – Matt Faus Feb 21 '18 at 19:52
5

Finally I figured out the reason why.

The child process kills itself and it's a linux kernel bug.

Details:

1) The child process kills itself, that's why strace/perf/auditctl cannot track it down.

2) The JNI call to create a process is triggered from a Java thread. When the thread eventually dies, it's also destroying the process that it creates.

3) In my code to fork and execve() a child process, I have the code to monitor parent process death and kill my child process with the following line: prctl( PR_SET_PDEATHSIG, SIGKILL ); I didn't pay special attention to this flag before b/c it's considered as a BEST PRACTICE for my other projects where child process is forked from the main thread.

4) If I comment out this line, the problem is gone. The original purpose is to kill the child process when the parent process is gone. Even w/o this flag, it's still the correct behavior. Seems like the ubuntu box default behavior.

5) From this article, https://bugzilla.kernel.org/show_bug.cgi?id=43300. it's a kernel bug, fixed in kernel version 3.4.0, my ubuntu box from AWS is kernel version 3.13.0-29-generic.

My machine configuration: ===>Ubuntu 14.04 LTS ===>3.13.0-29-generic

Some useful links to the issues:

a) http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them

b) prctl(PR_SET_PDEATHSIG, SIGNAL) is called on parent thread exit, not parent process exit

c) https://bugzilla.kernel.org/show_bug.cgi?id=43300

Community
  • 1
  • 1
Howy
  • 825
  • 2
  • 10
  • 20
3

As I mentioned earlier, the other choice is to use kernel trace, which can be done by perf tool.

# apt-get install linux-tools-3.13.0-35-generic
# perf list | grep kill
 syscalls:sys_enter_kill                            [Tracepoint event]
 syscalls:sys_exit_kill                             [Tracepoint event]
 syscalls:sys_enter_tgkill                          [Tracepoint event]
 syscalls:sys_exit_tgkill                           [Tracepoint event]
 syscalls:sys_enter_tkill                           [Tracepoint event]
 syscalls:sys_exit_tkill                            [Tracepoint event]
# perf record -a -e syscalls:sys_enter_kill sleep 10
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.634 MB perf.data (~71381 samples) ]

// Open a new shell to kill. 
$ sleep 9999 &
[1] 2387
$ kill -9 2387
[1]+  Killed                  sleep 9999
$ echo $$
9014

// Dump the trace in your original shell.
# perf script
...
        bash  9014 [001] 1890350.544971: syscalls:sys_enter_kill: pid: 0x00000953, sig: 0x00000009
Gary Luo
  • 266
  • 1
  • 2
  • Thanks, I tried it, it's strange. Sleep is caught with your example, but my process killed is NOT caught. That's really weird. "+++ killed by SIGKILL +++" could mean something else? – Howy Oct 15 '14 at 17:33
  • A quick look at strace code indicates that the tool essentially calls signame(WTERMSIG(status)) to decode exit status from the process being traced, which you can check also by hand, e.g. '$ echo $(($? & 0x7f))'. You can also refine your strace command to only trace signal events, e.g. 'strace -f -v -e trace=signal -tt -p xxx'. With that it will give you more information on which tid receives which signal at when. – Gary Luo Oct 16 '14 at 05:46
  • It's the same. Not giving me more details. It's telling me the following: sudo strace -f -v -e trace=signal -tt -p18468 Process 18468 attached 22:23:44.705727 +++ killed by SIGKILL +++ – Howy Oct 16 '14 at 22:25
  • Now I suspect Java's Garbage collector accidentally killed the process. My process is launched from a JNI call, which is triggered from a singleton java object. If it's GCed, that means the parent process kills the child process, why is auditctl and perf tools not able to track it down. – Howy Oct 16 '14 at 22:29
  • Traced the code again, and couldn't find a reason why Java's GC will garbage collect a singleton object associated with JNI. I am wondering 2 thing. 1) why is powerful tools such as strace, auditctl and perf script not able to track down who caused the kill? 2) Is ++KILL++ really means its killed from signal? Kind of weird. – Howy Oct 18 '14 at 00:28
  • Thanks, Gary, I found the reason. See my answer. – Howy Oct 25 '14 at 19:55