9

I am doing some bioinformatics work. I have a python script that at one point calls a program to do an expensive process (sequence alignment..uses a lot of computational power and memory). I call it using subprocess.Popen. When I run it on a testcase, it completes and finishes fine. However, when I run it on the full file, where it would have to do this multiple times for different sets of inputs, it dies. Subprocess throws:

OSError: [Errno 12] Cannot allocate memory

I found a few links here and here and here to similar problems, but I'm not sure that they apply in my case.

By default, the sequence aligner will try to request 51000M of memory. It doesn't always use that much, but it might. With the full input loaded and processed, that much is not available. However, capping the amount it requests or will attempt to use at a lower amount that might be available when running still gives me the same error. I've also tried running with shell=True and same thing.

This has been bugging me for a few days now. Thanks for any help.

Edit: Expanding the traceback:

File "..../python2.6/subprocess.py", line 1037, in _execute_child
    self.pid=os.fork()
OSError: [Errno 12] Cannot allocate memory

throws the error.

Edit2: Running in python 2.6.4 on 64 bit ubuntu 10.4

Community
  • 1
  • 1
jmerkin
  • 131
  • 1
  • 1
  • 6
  • Can you execute the subprocess program in standalone mode, from the command line? Can you launch several instances setting them to the background (terminating the command with `&`)? How about running `time -v foo ...` to get some stats on the program's use of computer resources? – Apalala Mar 16 '11 at 14:16

4 Answers4

15

I feel really sorry for the OP. 6 years later and no one mentioned that this is a very common problem in Unix, and actually has nothing to do with python or bioinformatics. A call to os.fork() temporarily doubles the memory of the parent process (the memory of the parent process must be available to the child process), before throwing it all away to do an exec(). While this memory isn't always actually copied, the system must have enough memory to allow for it to be copied, and thus if you're parent process is using more than half of the system memory and you subprocess out even "wc -l", you're going to run into a memory error.

The solution is to use posix_spawn, or create all your subprocesses at the beginning of the script, while memory consumption is low, then use them later on after the parent process has done it's memory-intensive thing.

A google search using the keyworks "os.fork" and "memory" will show several Stack Overflow posts on the topic that can further explain what's going on :)

J.J
  • 3,459
  • 1
  • 29
  • 35
0

I'd run a 64 bit python on a 64 bit OS.

With 32 bit, you can only really get 3 GB of RAM before OS starts telling you no more.

Another alternative might be to use memory mapped files to open the file:

http://docs.python.org/library/mmap.html

Edit: Ah you're on 64 bit .. possibly the cause is that you're running out of RAM+Swap .. fix would be to increase the amount of swap maybe.

matiu
  • 7,469
  • 4
  • 44
  • 48
  • I am running 64bit on both ends. What is the advantage to using a memory mapped file vs regular files? – jmerkin Mar 15 '11 at 13:36
  • well one advantage is you can tell the OS to optimize for either random or sequential access, and you can open and chunk files that are larger than your RAM. – matiu Apr 26 '11 at 23:37
  • the downside of memory mapped files are, if your machine goes down unexpectedly, they're very corruptible. – matiu Apr 26 '11 at 23:39
0

This doesn't have anything to do with Python or the subprocess module. subprocess.Popen is merely reporting to you the error that it is receiving from the operating system. (What operating system are you using, by the way?) From man 2 fork on Linux:

ENOMEM    fork()  failed  to  allocate  the  necessary  kernel  structures
          because memory is tight.

Are you calling subprocess.Popen multiple times? If so then I think the best you can do is make sure that the previous invocation of your process is terminated and reaped before the next invocation.

rlibby
  • 5,931
  • 20
  • 25
  • I am calling it multiple times, but in succession. I throw a subprocess.communicate() in there because I want the stdout from each. – jmerkin Mar 15 '11 at 13:35
  • 1
    Also, if this is it running out of memory, why doesn't linux use a swapfile (the machine is a shared resource and has 64g of ram so I think it has no defined swap space)? – jmerkin Mar 15 '11 at 13:58
  • @jmerkin, use `swapon -s` to tell if any swap is configured, use `free` to tell how much free physical memory and swap space you have. If you don't have swap, configuring swap is one way to increase available virtual memory. Also, I recommend you answer @Konstantin Tenzin's question. Are you actually using the result of the subprocess (stdout/stderr)? If so, is the result large? If so, what are you doing to make sure your Python process doesn't consume all available memory? – rlibby Mar 15 '11 at 15:19
  • yes, swap and memory is plentiful. i am using the result of stdout and it will be at most a few megabytes. either way, i want it written to a file. i have tried using stdout=subprocess.PIPE as well as 'subprocess.Popen('foo > bar',shell=True,stderr=subprocess.PIPE) to write it to the file. both gave the error. – jmerkin Mar 15 '11 at 15:58
  • @jmerkin, the `stdout` argument can take a "file object", e.g. `subprocess.Popen("foo", stdout=open("bar", "w"))`. Does `ulimit -a` report anything surprising? – rlibby Mar 15 '11 at 16:16
  • that'll save me a few lines of code, thanks. no, ulimit -a looks pretty ok and the values match those that resource.rlimit tell me, so python doesn't think something different than the machine. – jmerkin Mar 15 '11 at 16:25
  • Is your Python process very large in memory? You've somehow convinced the kernel that it can't fork your Python process. Maybe your Python process is large (as in, has many page table entries). Maybe you're hitting some other kind of resource limit. If it is a problem of large Python process, then when `subprocess.Popen("foo")` fails, basically all other `subprocess.Popen` calls should also fail. Try it: `try: ... except OSError: subprocess.Popen("ls")` (should fail with same error), and additionally if you call foo from a shell, it should work. – rlibby Mar 15 '11 at 16:57
  • @rlibby It is pretty big. I have been confining the subprocess to using less ram by feeding it a parameter. I'm interested in what the limit is, if anybody has an idea. I don't see why it shouldn't start using some swap before dying. – jmerkin Mar 17 '11 at 18:45
  • @jmerkin, I will bet the problem is the size of your Python process. Try a scheme that uses `clone`/`exec` instead of `fork`/`exec`. Python's [`os.spawn`](http://docs.python.org/library/os.html#os.spawnlp) family empirically seems to do this. (Check yourself with `strace -f`.) Unfortunately it will mean a little more process bookkeeping on your end. – rlibby Apr 04 '11 at 00:20
0

Do you use subprocess.PIPE? I had problems and read about problems when it was used. Temporary files usually solved the problem.

Konstantin Tenzin
  • 12,398
  • 3
  • 22
  • 20
  • I have tried both ways. I tried directly writing to a file as well as subprocess.PIPE: a=subprocess.Popen('foo > bar',shell=True,stderr=subprocess.PIPE); a.wait()) a=subprocess.Popen(foo,stdout=subprocess.PIPE,stderr=subprocess.PIPE) f_out.write(a.communicate()[0]) and that gave me this error as well. Stderr is pretty negligible, a few lines at most concerning the program's progress, so it shouldn't be much. – jmerkin Mar 15 '11 at 16:00