3

There is a performance issue when using a system call after pre-allocating big amount of memory (e.g. numpy array). The issue grows with the amount of memory.

test.py :

import os
import sys
import time
import numpy

start = time.clock()
test = int(sys.argv[1])
a = numpy.zeros((test,500,500))
for i in range(test) :
    os.system("echo true > /dev/null")
elapsed = (time.clock() - start)
print(elapsed)

The per-iteration time increases dramatically :

edouard@thorin:~/now3/code$ python test.py 100
0.64
edouard@thorin:~/now3/code$ python test.py 200
2.09
edouard@thorin:~/now3/code$ python test.py 400
14.26

This should not be related to virtual memory. Is it a known issue?

egouden
  • 43
  • 1
  • 8
  • This is probably partly related to the amount of _contiguous_ memory you have available. Even if you have a lot of ram, allocating a large contiguous block can take more time than you might expect. If you run the `python test.py 1000` test several times in a row, do you see a large variation in the time it takes? – Joe Kington May 14 '12 at 15:36
  • @JoeKington For me allocation takes only about ~1s of the 35s to run the test. – mgilson May 14 '12 at 15:42
  • After further investigation, the question and code have been modified for the sake of clarity. There is a small variation in the time it takes. And yes allocation time is very fast. – egouden May 14 '12 at 15:44
  • Please include your Python version and details of the operating system. – NPE May 14 '12 at 15:50
  • python Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41) [GCC 4.4.3] on linux2 – egouden May 14 '12 at 15:55
  • Linux thorin 2.6.32-36-server #79-Ubuntu SMP Tue Nov 8 22:44:38 UTC 2011 x86_64 GNU/Linux – egouden May 14 '12 at 15:56

2 Answers2

5

You seem to have narrowed the problem down to os.system() taking longer after you've allocated a large NumPy array.

Under the covers, system() uses fork(). Even though fork() is supposed to be very cheap (due to its use of copy-on-write), it turns out that things are not quite as simple.

In particular, there are known issues with Linux's fork() taking longer for larger processes. See, for example:

Both documents are fairly old, so I am not sure what the state of the art is. However, the evidence suggests that you've encountered an issue of this sort.

If you can't get rid of those system() calls, I would suggest two avenues of research:

  • Look into enabling huge pages.
  • Consider the possibility of spawning an auxiliary process on startup, whose job would be to invoke the necessary system() commands.
Community
  • 1
  • 1
NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • This is the best answer for the time being. Thank you also for your suggestions but I will preferably drop those system calls ;-) – egouden May 14 '12 at 16:20
4

What happens if you don't use the os.system call?

for me:

python test.py 10   # 0.14
python test.py 100  # 1.18
python test.py 1000 # 11.77

It grows approximately an order of magnitide each time without os.system. So, I'd say your problem is in the system call, not the performance of numpy (This is confirmed by doing the same test over except this time commenting out the numpy portion of the code). At this point, the question becomes "Why is it slow(er) to do repeated system calls?" ... Unfortunately, I don't have an answer for that.

Interestingly enough, If I do this in bash, there is no problem (it returns almost immediately)...

time for i in `seq 1 1000`; do echo true > /dev/null; done

It also seems that the problem isn't just os.system -- subprocess.Popen suffers the same mality... (although, subprocess may just call os.system under the hood, I don't actually know...)

EDIT

This is getting better and better. In my previous tests, I was leaving the allocation of the numpy array ... If you remove the allocation of the numpy array also, the test goes relatively fast. However, the allocation of the array (1000,800,800) only takes ~1 second. So, the allocation isn't taking all (or even much of the time) and the assignment of data to the array doesn't take much time either, but the allocation status of the array does effect how long it takes for the system call to execute. Very weird.

mgilson
  • 300,191
  • 65
  • 633
  • 696
  • Indeed but don't change the question ;-). This is only an example. My code is much more complex. – egouden May 14 '12 at 14:55
  • @user1393730 Sorry, I didn't read your question carefully enough. I'll edit my answer to reflect that the problem is (probably) due to the system call and not the use of numpy. – mgilson May 14 '12 at 14:59