801

My application runs as a background process on Linux. It is currently started at the command line in a Terminal window.

Recently a user was executing the application for a while and it died mysteriously. The text:

Killed

was on the terminal. This happened two times. I asked if someone at a different Terminal used the kill command to kill the process? No.

Under what conditions would Linux decide to kill my process? I believe the shell displayed "killed" because the process died after receiving the kill(9) signal. If Linux sent the kill signal should there be a message in a system log somewhere that explains why it was killed?

mikemaccana
  • 110,530
  • 99
  • 389
  • 494
sbq
  • 8,049
  • 3
  • 17
  • 5
  • 42
    linux killed my process and logged it in /var/log/messages on redhat – Dean Hiller Sep 18 '13 at 12:45
  • 2
    See also [this answer](http://unix.stackexchange.com/questions/136291/will-linux-start-killing-my-processes-without-asking-me-if-memory-gets-short) on unix.stackexchange.com. – Richard Dec 28 '14 at 21:06
  • 5
    There are 3 players in this event: (1) The process which (common cause) takes too much memory and causes the OOM condition (2) The kernel which sends the SIGKILL (signal 9) to terminate it and logs the fact in some system log like `/var/log/messages` (3) The shell under which the process ran which is the process that prints the `Killed` notification when the exit status from `waitpid(2)` indicates the child process died from signal 9. – arielf Dec 18 '19 at 05:39
  • 6
    After reading @DeanHiller's answer, I found log messages on Ubuntu under `/var/log/syslog` – Dinei Feb 24 '20 at 23:37
  • Related: [How do I disable the systemd OOM process killer in Ubuntu 22.04?](https://askubuntu.com/q/1404888/78223) – kenorb Sep 23 '22 at 17:16

13 Answers13

491

If the user or sysadmin did not kill the program the kernel may have. The kernel would only kill a process under exceptional circumstances such as extreme resource starvation (think mem+swap exhaustion).

dwc
  • 24,196
  • 7
  • 44
  • 55
  • 40
    If the kernel killed the process would it put a message in a log somewhere? – sbq Apr 07 '09 at 17:29
  • 219
    I just wrote a program that malloc'd memory in an inifite loop. After the system got slow, "Killed" was displayed in the terminal and the process was terminated. The file /var/log/kern.log contained a lot of info about the termination. -Thanks for the pointer. – sbq Apr 07 '09 at 17:49
  • 10
    That's almost definitely it. I saw this a lot when TAing. Many students would forget to free their objects, and the apps would eventually reach 3GB of virtual memory usage. As soon as it hit that point it was killed. – Herms Apr 07 '09 at 19:03
  • Does the OS actually kill the process or does the program simply crash? I think allocating memory when nothing is available is a fatal error in C++. – wp123 May 12 '10 at 14:54
  • 2
    The OS will actually kill the process, regardless of the language used. Failure to allocate memory is handled in various ways in various languages, but typically it's a matter of handling an exception or checking return status. This won't help if you've successfully allocated memory and the OOM killer targets you. – dwc May 13 '10 at 20:41
  • 3
    The OOM killer strikes again! – new123456 Jul 22 '11 at 00:36
  • 12
    When the "program simply crashes", that *is* the OS actually killing the process! – Bernd Jendrissek Nov 23 '11 at 13:11
  • 3
    Here is a good article on how to protect your process from being killed: http://backdrift.org/how-to-create-oom-killer-exceptions – Asad Hasan Aug 16 '12 at 00:10
  • 104
    Use `dmesg` to see kernel log: here I find my python processes killed by kernel due to extreme virtual memory consumption. – caneta Aug 28 '13 at 07:43
  • 1
    @BerndJendrissek: Some "program simply crashes" are hidden calls to `terminate`, I don't know if that counts as the OS killing it. – Mooing Duck Mar 14 '14 at 21:01
  • 2
    It's the kernel OOM killer. Took me a while to find this article. http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html – jbrahy Oct 09 '14 at 07:00
  • Once I find the log in kernel.log, it was killed by OOM killer. – Chris May 21 '15 at 01:57
  • Is there any way to prevent it, and kill the program gracefully? – lordcheeto Feb 08 '16 at 23:35
  • @wp123 this is impossible in Linux. In Linux, the kernel will always return true when you call malloc, and it will simply try to find ways of giving all processes the memory they need, whether it's by swapping or by some other means. In extreme circumstances, such as apparently in this case, the kernel will kill a process. At no point does the program/programmer get any indication anything is wrong. – toon81 Apr 04 '18 at 12:40
419

Try:

dmesg -T| grep -E -i -B100 'killed process'

Where -B100 signifies the number of lines before the kill happened.

Omit -T on Mac OS.

Ravindranath Akila
  • 209
  • 3
  • 35
  • 45
  • 6
    FYI, from `info egrep`: "egrep is the same as grep -E. ... Direct invocation as either egrep or fgrep is deprecated" – Air May 21 '14 at 15:49
  • 11
    In the case of a simple pattern like `'killed process'` you can just use `grep` instead of `egrep` with no other changes. For a more complex pattern, you would change replace e.g. `egrep -i -B100 'foo|ba[rz]'` with `grep -E -i -B100 'foo|ba[rz]'`. [This Q&A](http://stackoverflow.com/q/18058875/2359271) gives more detail. – Air May 22 '14 at 15:10
  • 5
    I'd also suggest using `dmesg -T` in order to get readable timestamps – gukoff Nov 22 '17 at 11:05
  • If you only want to see a list of recently killed process on a server, try using `dmesg -T| grep -E 'Killed process'` – Meet Sinojia Jun 19 '20 at 14:43
  • Please be aware that the `dmesg` command only displays messages stored in the kernel ring buffer. If you are searching for OOM (Out of Memory) messages, they could potentially be overwritten and might not be accessible via the dmesg command. You will need to search in other locations like `/var/log/kern.log` or `/var/log/messages` – caltuntas Aug 15 '23 at 11:57
196

This looks like a good article on the subject: Taming the OOM killer (1).

The gist is that Linux overcommits memory. When a process asks for more space, Linux will give it that space, even if it is claimed by another process, under the assumption that nobody actually uses all of the memory they ask for. The process will get exclusive use of the memory it has allocated when it actually uses it, not when it asks for it. This makes allocation quick, and might allow you to "cheat" and allocate more memory than you really have. However, once processes start using this memory, Linux might realize that it has been too generous in allocating memory it doesn't have, and will have to kill off a process to free some up. The process to be killed is based on a score taking into account runtime (long-running processes are safer), memory usage (greedy processes are less safe), and a few other factors, including a value you can adjust to make a process less likely to be killed. It's all described in the article in a lot more detail.

Edit: And here is [another article] (2) that explains pretty well how a process is chosen (annotated with some kernel code examples). The great thing about this is that it includes some commentary on the reasoning behind the various badness() rules.

Drazisil
  • 3,070
  • 4
  • 33
  • 53
Adam Jaskiewicz
  • 10,934
  • 3
  • 34
  • 37
  • 3
    I really love the article links. I'd suggest anyone who's interested in the topic to read them -- especially the comments on the lwn article. – Jon Bringhurst Mar 15 '11 at 15:33
  • 4
    "Linux will give it that space, even if it is claimed by another process" That's not quite how virtual memory works... – Mooing Duck Mar 14 '14 at 21:02
  • 1
    the article is quite old (2009) and not all functionality suggested in the article is in mainline. – Alexander Oh Apr 26 '18 at 09:07
  • Related: [How do I disable the systemd OOM process killer in Ubuntu 22.04?](https://askubuntu.com/q/1404888/78223) – kenorb Sep 23 '22 at 17:17
64

Let me first explain when and why OOMKiller get invoked?

Say you have 512 RAM + 1GB Swap memory. So in theory, your CPU has access to total of 1.5GB of virtual memory.

Now, for some time everything is running fine within 1.5GB of total memory. But all of sudden (or gradually) your system has started consuming more and more memory and it reached at a point around 95% of total memory used.

Now say any process has requested large chunck of memory from the kernel. Kernel check for the available memory and find that there is no way it can allocate your process more memory. So it will try to free some memory calling/invoking OOMKiller (http://linux-mm.org/OOM).

OOMKiller has its own algorithm to score the rank for every process. Typically which process uses more memory becomes the victim to be killed.

Where can I find logs of OOMKiller?

Typically in /var/log directory. Either /var/log/kern.log or /var/log/dmesg

Hope this will help you.

Some typical solutions:

  1. Increase memory (not swap)
  2. Find the memory leaks in your program and fix them
  3. Restrict memory any process can consume (for example JVM memory can be restricted using JAVA_OPTS)
  4. See the logs and google :)
Jadav Bheda
  • 5,031
  • 1
  • 30
  • 28
25

This is the Linux out of memory manager (OOM). Your process was selected due to 'badness' - a combination of recentness, resident size (memory in use, rather than just allocated) and other factors.

sudo journalctl -xb

You'll see a message like:

Jul 20 11:05:00 someapp kernel: Mem-Info:
Jul 20 11:05:00 someapp kernel: Node 0 DMA per-cpu:
Jul 20 11:05:00 someapp kernel: CPU    0: hi:    0, btch:   1 usd:   0
Jul 20 11:05:00 someapp kernel: Node 0 DMA32 per-cpu:
Jul 20 11:05:00 someapp kernel: CPU    0: hi:  186, btch:  31 usd:  30
Jul 20 11:05:00 someapp kernel: active_anon:206043 inactive_anon:6347 isolated_anon:0
                                    active_file:722 inactive_file:4126 isolated_file:0
                                    unevictable:0 dirty:5 writeback:0 unstable:0
                                    free:12202 slab_reclaimable:3849 slab_unreclaimable:14574
                                    mapped:792 shmem:12802 pagetables:1651 bounce:0
                                    free_cma:0
Jul 20 11:05:00 someapp kernel: Node 0 DMA free:4576kB min:708kB low:884kB high:1060kB active_anon:10012kB inactive_anon:488kB active_file:4kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present
Jul 20 11:05:00 someapp kernel: lowmem_reserve[]: 0 968 968 968
Jul 20 11:05:00 someapp kernel: Node 0 DMA32 free:44232kB min:44344kB low:55428kB high:66516kB active_anon:814160kB inactive_anon:24900kB active_file:2884kB inactive_file:16500kB unevictable:0kB isolated(anon):0kB isolated
Jul 20 11:05:00 someapp kernel: lowmem_reserve[]: 0 0 0 0
Jul 20 11:05:00 someapp kernel: Node 0 DMA: 17*4kB (UEM) 22*8kB (UEM) 15*16kB (UEM) 12*32kB (UEM) 8*64kB (E) 9*128kB (UEM) 2*256kB (UE) 3*512kB (UM) 0*1024kB 0*2048kB 0*4096kB = 4580kB
Jul 20 11:05:00 someapp kernel: Node 0 DMA32: 216*4kB (UE) 601*8kB (UE) 448*16kB (UE) 311*32kB (UEM) 135*64kB (UEM) 74*128kB (UEM) 5*256kB (EM) 0*512kB 0*1024kB 1*2048kB (R) 0*4096kB = 44232kB
Jul 20 11:05:00 someapp kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jul 20 11:05:00 someapp kernel: 17656 total pagecache pages
Jul 20 11:05:00 someapp kernel: 0 pages in swap cache
Jul 20 11:05:00 someapp kernel: Swap cache stats: add 0, delete 0, find 0/0
Jul 20 11:05:00 someapp kernel: Free swap  = 0kB
Jul 20 11:05:00 someapp kernel: Total swap = 0kB
Jul 20 11:05:00 someapp kernel: 262141 pages RAM
Jul 20 11:05:00 someapp kernel: 7645 pages reserved
Jul 20 11:05:00 someapp kernel: 264073 pages shared
Jul 20 11:05:00 someapp kernel: 240240 pages non-shared
Jul 20 11:05:00 someapp kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
Jul 20 11:05:00 someapp kernel: [  241]     0   241    13581     1610      26        0             0 systemd-journal
Jul 20 11:05:00 someapp kernel: [  246]     0   246    10494      133      22        0         -1000 systemd-udevd
Jul 20 11:05:00 someapp kernel: [  264]     0   264    29174      121      26        0         -1000 auditd
Jul 20 11:05:00 someapp kernel: [  342]     0   342    94449      466      67        0             0 NetworkManager
Jul 20 11:05:00 someapp kernel: [  346]     0   346   137495     3125      88        0             0 tuned
Jul 20 11:05:00 someapp kernel: [  348]     0   348    79595      726      60        0             0 rsyslogd
Jul 20 11:05:00 someapp kernel: [  353]    70   353     6986       72      19        0             0 avahi-daemon
Jul 20 11:05:00 someapp kernel: [  362]    70   362     6986       58      18        0             0 avahi-daemon
Jul 20 11:05:00 someapp kernel: [  378]     0   378     1621       25       8        0             0 iprinit
Jul 20 11:05:00 someapp kernel: [  380]     0   380     1621       26       9        0             0 iprupdate
Jul 20 11:05:00 someapp kernel: [  384]    81   384     6676      142      18        0          -900 dbus-daemon
Jul 20 11:05:00 someapp kernel: [  385]     0   385     8671       83      21        0             0 systemd-logind
Jul 20 11:05:00 someapp kernel: [  386]     0   386    31573      153      15        0             0 crond
Jul 20 11:05:00 someapp kernel: [  391]   999   391   128531     2440      48        0             0 polkitd
Jul 20 11:05:00 someapp kernel: [  400]     0   400     9781       23       8        0             0 iprdump
Jul 20 11:05:00 someapp kernel: [  419]     0   419    27501       32      10        0             0 agetty
Jul 20 11:05:00 someapp kernel: [  855]     0   855    22883      258      43        0             0 master
Jul 20 11:05:00 someapp kernel: [  862]    89   862    22926      254      44        0             0 qmgr
Jul 20 11:05:00 someapp kernel: [23631]     0 23631    20698      211      43        0         -1000 sshd
Jul 20 11:05:00 someapp kernel: [12884]     0 12884    81885     3754      80        0             0 firewalld
Jul 20 11:05:00 someapp kernel: [18130]     0 18130    33359      291      65        0             0 sshd
Jul 20 11:05:00 someapp kernel: [18132]  1000 18132    33791      748      64        0             0 sshd
Jul 20 11:05:00 someapp kernel: [18133]  1000 18133    28867      122      13        0             0 bash
Jul 20 11:05:00 someapp kernel: [18428]    99 18428   208627    42909     151        0             0 node
Jul 20 11:05:00 someapp kernel: [18486]    89 18486    22909      250      46        0             0 pickup
Jul 20 11:05:00 someapp kernel: [18515]  1000 18515   352905   141851     470        0             0 npm
Jul 20 11:05:00 someapp kernel: [18520]     0 18520    33359      291      66        0             0 sshd
Jul 20 11:05:00 someapp kernel: [18522]  1000 18522    33359      294      64        0             0 sshd
Jul 20 11:05:00 someapp kernel: [18523]  1000 18523    28866      115      12        0             0 bash
Jul 20 11:05:00 someapp kernel: Out of memory: Kill process 18515 (npm) score 559 or sacrifice child
Jul 20 11:05:00 someapp kernel: Killed process 18515 (npm) total-vm:1411620kB, anon-rss:567404kB, file-rss:0kB
mikemaccana
  • 110,530
  • 99
  • 389
  • 494
  • How can I find out the memory deficit amount? – TD1 Apr 16 '21 at 10:38
  • 1
    @TD1 it depends on the amount of data - normally you'd have a heap snapshot of the process (which depends on the programming language / VM used). But the answer may be - "the deficit is infinite, because you have a memory leak" - eg you're adding to an array you're using, and it just gets bigger the longer your program runs. – mikemaccana Apr 16 '21 at 10:41
  • Just a note, if you add grep like so: sudo journalctl -xb | grep "Killed process " it will display just the lines about killing the process you're interested in. –  Feb 13 '23 at 15:00
17

As dwc and Adam Jaskiewicz have stated, the culprit is likely the OOM Killer. However, the next question that follows is: How do I prevent this?

There are several ways:

  1. Give your system more RAM if you can (easy if its a VM)
  2. Make sure the OOM killer chooses a different process.
  3. Disable the OOM Killer
  4. Choose a Linux distro which ships with the OOM Killer disabled.

I found (2) to be especially easy to implement:

Adjust /proc/<PID>/oom_score_adj to -1000 (which automatically takes oom_adj to -17 and oom_score to 0).

See how To Create OOM Exclusions in Linux for more.

mikemaccana
  • 110,530
  • 99
  • 389
  • 494
Carl
  • 43,122
  • 10
  • 80
  • 104
  • 9
    It was the RAM for me. I upgraded from 2 to 4GB RAM and problem gone. Now the problem is with the bill :P – Gus Oct 24 '19 at 13:52
  • 2
    Way #2: The article was useful but it's outdated. You should now adjust `/proc//oom_score_adj` to -1000 (which automatically takes `oom_adj` to -17 and `oom_score` to 0, so your process never gets killed) – maganap Dec 02 '20 at 09:57
  • @maganap excellent comment, I've added that to the answer since the article was outdated. – mikemaccana Feb 14 '23 at 10:04
13

A tool like systemtap (or a tracer) can monitor kernel signal-transmission logic and report. e.g., https://sourceware.org/systemtap/examples/process/sigmon.stp

# stap --example sigmon.stp -x 31994 SIGKILL
   SPID     SNAME            RPID  RNAME            SIGNUM SIGNAME
   5609     bash             31994 find             9      SIGKILL

The filtering if block in that script can be adjusted to taste, or eliminated to trace systemwide signal traffic. Causes can be further isolated by collecting backtraces (add a print_backtrace() and/or print_ubacktrace() to the probe, for kernel- and userspace- respectively).

fche
  • 2,641
  • 20
  • 28
10

The PAM module to limit resources caused exactly the results you described: My process died mysteriously with the text Killed on the console window. No log output, neither in syslog nor in kern.log. The top program helped me to discover that exactly after one minute of CPU usage my process gets killed.

Christian Ammer
  • 7,464
  • 6
  • 51
  • 108
4

In an lsf environment (interactive or otherwise) if the application exceeds memory utilization beyond some preset threshold by the admins on the queue or the resource request in submit to the queue the processes will be killed so other users don't fall victim to a potential run away. It doesn't always send an email when it does so, depending on how its set up.

One solution in this case is to find a queue with larger resources or define larger resource requirements in the submission.

You may also want to review man ulimit

Although I don't remember ulimit resulting in Killed its been a while since I needed that.

kenorb
  • 155,785
  • 88
  • 678
  • 743
oldman
  • 73
  • 3
3

In my case this was happening with a Laravel queue worker. The system logs did not mention any killing so I looked further and it turned out that the worker was basically killing itself because of a job that exceeded the memory limit (which is set to 128M by default).

Running the queue worker with --timeout=600 and --memory=1024 fixed the problem for me.

iSWORD
  • 768
  • 15
  • 22
2

We have had recurring problems under Linux at a customer site (Red Hat, I think), with OOMKiller (out-of-memory killer) killing both our principle application (i.e. the reason the server exists) and it's data base processes.

In each case OOMKiller simply decided that the processes were using to much resources... the machine wasn't even about to fail for lack of resources. Neither the application nor it's database has problems with memory leaks (or any other resource leak).

I am not a Linux expert, but I rather gathered it's algorithm for deciding when to kill something and what to kill is complex. Also, I was told (I can't speak as to the accuracy of this) that OOMKiller is baked into the Kernel and you can't simply not run it.

Lawrence Dol
  • 63,018
  • 25
  • 139
  • 189
  • 1
    IIRC, OOMKiller is only invoked as a last resort. I think the system will even send a signal to various apps asking them to kindly give up some resources before it is forced to invoke OOMKiller. Take with a grain of salt, as it's been a long time... – rmeador Apr 07 '09 at 18:04
  • 1
    You *can* simply not run it. It is baked into the kernel, but there are options to tune how it runs, and even which processes it is likely to kill. It runs when the whole system is out of memory, not when a specific process is using too much. See my answer for more details. – Adam Jaskiewicz Apr 07 '09 at 18:28
  • 6
    Not running oomkiller is pretty easy. `echo "2" > /proc/sys/vm/overcommit_memory` – R.. GitHub STOP HELPING ICE Feb 20 '12 at 00:05
  • Red Hat doesn't want to allow it to be changed: `sudo echo "2" > /proc/sys/vm/overcommit_memory` /proc/sys/vm/overcommit_memory: Permission denied – Brent Faust Jul 31 '14 at 21:46
  • 2
    Try `echo 2 | sudo tee /proc/sys/vm/overcommit_memory` – Hypershadsy Feb 12 '15 at 20:08
0

I encountered this problem lately. Finally, I found my processes were killed just after Opensuse zypper update was called automatically. To disable zypper update solved my problem.

poordeveloper
  • 2,272
  • 1
  • 23
  • 36
  • I am seeing the same issue. How did u track down which process killed your process? Seems there is a tool to check who sends SIGKILL to a process. – Howy Oct 04 '14 at 18:59
0

The user has the ability to kill his own programs, using kill or Control+C, but I get the impression that's not what happened, and that the user complained to you.

root has the ability to kill programs of course, but if someone has root on your machine and is killing stuff you have bigger problems.

If you are not the sysadmin, the sysadmin may have set up quotas on CPU, RAM, ort disk usage and auto-kills processes that exceed them.

Other than those guesses, I'm not sure without more info about the program.

Tom Ritter
  • 99,986
  • 30
  • 138
  • 174
  • 7
    CTRL-C sends a different kill than the OP reported (SIGINT (2) as I recall, whereas the program is receiving a SIGKILL (9)). – Powerlord Apr 07 '09 at 17:26