24

I run Python 2.7 on a Linux machine with 16GB Ram and 64 bit OS. A python script I wrote can load too much data into memory, which slows the machine down to the point where I cannot even kill the process any more.

While I can limit memory by calling:

ulimit -v 12000000

in my shell before running the script, I'd like to include a limiting option in the script itself. Everywhere I looked, the resource module is cited as having the same power as ulimit. But calling:

import resource
_, hard = resource.getrlimit(resource.RLIMIT_DATA)
resource.setrlimit(resource.RLIMIT_DATA, (12000, hard))

at the beginning of my script does absolutely nothing. Even setting the value as low as 12000 never crashed the process. I tried the same with RLIMIT_STACK, as well with the same result. Curiously, calling:

import subprocess
subprocess.call('ulimit -v 12000', shell=True)

does nothing as well.

What am I doing wrong? I couldn't find any actual usage examples online.


edit: For anyone who is curious, using subprocess.call doesn't work because it creates a (surprise, surprise!) new process, which is independent of the one the current python program runs in.

Arne
  • 17,706
  • 5
  • 83
  • 99
  • Is there any room to make the program more memory-efficient? – TigerhawkT3 May 15 '15 at 21:47
  • 1
    There is, but that will take a while. At the moment, I need to test it and make sure that it doesn't shut the computer down. And having a fail-safe for the memory will be useful later, too. – Arne May 15 '15 at 22:01
  • 1
    Since it's in Python 2.7, how about switching to Python 3 and using a 2-to-3 converter on your program? Python 3 has several performance improvements over Python 2, some of which are memory-related. – TigerhawkT3 May 15 '15 at 22:05
  • 1
    I will do that -- but at this point, I am just curious if or how limiting memory works in python. – Arne May 15 '15 at 22:38
  • can't you control what you load into memory? i mean, isn't it YOUR script? – oxymor0n May 15 '15 at 22:42
  • It's connected to a constant stream (from the Twitter Streaming API), so.. control is not trivial. I will control it at some point, but right now I just want it not to crash. – Arne May 16 '15 at 13:33
  • 5
    This issue comes up in interactive data analysis all the time - you load a large array (say, 8GB) and start your work. Then you inadvertently square the array (a typo in your code, or misunderstanding an API, etc) and now the script requests 64 GB and the system freezes. :P You would much rather have the process killed than restart your computer. – Gordon Bean Oct 02 '15 at 18:37

1 Answers1

15

resource.RLIMIT_VMEM is the resource corresponding to ulimit -v.

RLIMIT_DATA only affects brk/sbrk system calls while newer memory managers tend to use mmap instead.

The second thing to note is that ulimit/setrlimit only affects the current process and its future children.

Regarding the AttributeError: 'module' object has no attribute 'RLIMIT_VMEM' message: the resource module docs mention this possibility:

This module does not attempt to mask platform differences — symbols not defined for a platform will not be available from this module on that platform.

According to the bash ulimit source linked to above, it uses RLIMIT_AS if RLIMIT_VMEM is not defined.

ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152
  • 1
    I don't use multithreading, so I hope that is not the problem. But when I enter `RLIMIT_DATA`, I get the following error message: `Traceback (most recent call last): File "my_script.py", line 417, in sys.exit(main()) File "my_script.py", line 391, in main _, hard = resource.getrlimit(resource.RLIMIT_VMEM) AttributeError: 'module' object has no attribute 'RLIMIT_VMEM'` From the list you referenced, all fields could be found -- except this one. I am trying to run it with Python 3.x right now.. – Arne May 16 '15 at 13:31