13

I have similar problem to this one: Python subprocess.Popen "OSError: [Errno 12] Cannot allocate memory"

I have a daemon process that runs OK for a few minutes and then fails to run shell programs via popen2.Popen3(). It spawns 20 threads. Memory does not appear to be the issue; this is the only program running on the machine, which has 2G of RAM, and it's using less than 400M. I've been logging ru_maxrss and this is only 50M (before and after OSError is raised).

ulimit -a:

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 15962
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 15962
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

I've also been watching free -m and ls /proc/$pid/fd | wc -l while it is running, and neither of these seems to indicate resource exhaustion. Here's typical free -m while running:

             total       used       free     shared    buffers     cached
Mem:          2003        374       1628          0         46        154
-/+ buffers/cache:        173       1830
Swap:          283          0        283

... and the fd count is around 90-100.

The host is Ubuntu 12.04 (server jeos - minimal vm), Python 2.7.3, running on a VMWare host.

So I'm wondering: what do I do next to diagnose why this is failing? Are there some more resource stats I can gather? Do I need to get down to the level of strace?

Community
  • 1
  • 1
Alistair Bayley
  • 331
  • 2
  • 14
  • How many processes is the user running? – Ignacio Vazquez-Abrams May 23 '14 at 01:51
  • Is the system configured to overcommit or not? Memory accounting will calculate the same amount of memory that the parent uses before forking(cloning) a child. How big is your app before forking? – alvits May 23 '14 at 02:00
  • 1 process, 20 threads. Overcommit is on. I don't know how big my app is before forking; isn't that what ru_maxrss tells you? If not, how would I find out? – Alistair Bayley May 23 '14 at 09:56
  • Wow, this is tricky. Assuming you can start other processes from shell, memory itself is clearly not the issue. Thus it must be some other limit. Please check with much higher (or unlimited) settings for: `max locked memory` it's oddly low, `stack size` in case your process is at the very end of Python stack and C call fails, any system-level quotas or limits that are not part of ulimit... – Dima Tisnek Dec 12 '14 at 10:46
  • Try running with strace to see which system call fails (most likely it will be either `fork` or `exec*`, but it might be something else too). – Nikratio Apr 28 '15 at 01:17
  • 4
    Try running your process with `strace -ff`. This will tell you what syscall is failing and may give useful hints on what's going wrong. – Andrea Corbellini May 06 '15 at 19:37
  • If the underlying problem is, in fact, same as in linked question, then adding swap of size comparable to RAM is an acceptable work-around. This can also be used as a simple test for hypothesis in linked accepted answer. To restate the problem, `Popen == fork + exec` where `fork` [potentially, almost] doubles used memory for a miniscule period of time. Kernel could be telling you to heck off... – Dima Tisnek May 14 '15 at 07:07
  • I recommend an http://sscce.org/ – dstromberg May 21 '15 at 23:43
  • +1 for `strace`. If you accedentally try to allocate 2 **64 - 2 of memory due to integer underflow, you'll get `ENOMEM`. – myaut May 22 '15 at 06:19
  • You really should nor use os.popen() any longer especially if you are in 2.7+. See https://docs.python.org/2/library/os.html#file-object-creation and use subprocess. – cgseller Jul 14 '15 at 00:25

2 Answers2

2

Hypothesis: if your VM is 32-bit, you may be running out of address space.

Not memory: address space. Let me explain: in Linux many things (IO, video card, memory-mapped files) use up address space without necessarily consuming corresponding amount of main memory.

Here's an explanation of related issues:

http://us.download.nvidia.com/XFree86/Linux-x86/331.89/README/knownissues.html

(look for "Kernel virtual address space exhaustion on the X86 platform" section, use dmesg to test if that's the situation)

ENOMEM error in result of mmap may very well mean situation of "not enough address space", not just "not enough memory", although I'm not sure how to diagnose this in CPython. If you have some big files mmaped on your system by any process running on it, well..

LetMeSOThat4U
  • 6,470
  • 10
  • 53
  • 93
1

Check if you have run out of space on your disk drive, that was the problem in my case.

bravo@by1-dotbravo-01:~$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1              16G   16G     0 100% /
tmpfs                 2.0G     0  2.0G   0% /dev/shm
/dev/sdb              296G  162G  119G  58% /home
Andrew Selivanov
  • 1,306
  • 15
  • 22