0

I have a simple string matching script that tests just fine for multiprocessing with up to 8 Pool workers on my local mac with 4 cores. However, the same script on an AWS c1.xlarge with 8 cores generally kills all but 2 workers, the CPU only works at 25%, and after a few rounds stops with MemoryError.

I'm not too familiar with server configuration, so I'm wondering if there are any settings to tweak?

The pool implementation looks as follows, but doesn't seem to be the issue as it works locally. There would be several thousand targets per worker, and it doesn't run past the first five or so. Happy to share more of the code if necessary.

pool = Pool(processes = numProcesses)
totalTargets = len(getTargets('all'))
targetsPerBatch = totalTargets / numProcesses
pool.map_async(runMatch, itertools.izip(itertools.repeat(targetsPerBatch), xrange(0, totalTargets, targetsPerBatch))).get(99999999)
pool.close()
pool.join()
Stefan
  • 41,759
  • 13
  • 76
  • 81
  • How much virtual memory does each of the 8 workers use on your Mac (according to Activity Monitor or `top`)? – abarnert Sep 09 '13 at 20:35
  • what are you doing? Look at code! facepalm.jpg use `chunksize` keyword and `imap`. – eri Sep 09 '13 at 20:41
  • Thanks - 4GB of virtual memory each, 500-1000MB of real memory (there's 8GB total available). – Stefan Sep 09 '13 at 20:51
  • 1000MB of real - not surprised.) – eri Sep 09 '13 at 20:55
  • also pool process must be able to import module there declared. wrap your code in functions and use `if __name__ == "__main__"` to prevent execution while loading. – eri Sep 09 '13 at 20:58
  • So adding them all up, you're using a total of 6GB of real memory, which is fine on both your Mac and the xlarge, but 32GB of VM, which is fine on your Mac but not on the xlarge. My answer explains why, and how to work around it. However, if you can reduce that VM use, it'll be a much better solution. – abarnert Sep 09 '13 at 21:00
  • @eri: He's just giving us a fragment of the code. We know his real code works on the Mac, and the only problem on Linux is a MemoryError, so he's obviously not making the mistake of creating a new pool in each process and exponentially forkbombing himself. – abarnert Sep 09 '13 at 21:01
  • @eri: And I don't know why you're facepalming over the `map_async`. The default chunksize is `len/(pool*4)`. And besides, you can configure that just as easily with `map_async` as with `imap`. The only place `map_async` potentially wastes memory is in the parent process, not the workers, and given that his workers are each using 4GB of VM, they're the problem. – abarnert Sep 09 '13 at 21:08
  • Where data cames from? he able use database cursor as iterator, file lines as iterator, etc. it will reduce memory. If he spawn pool at plain module there is getTargets('all') fires at import - each process imports calls it and load useless data into self. – eri Sep 10 '13 at 05:54

1 Answers1

0

The MemoryError means you're running out of system-wide virtual memory. How much virtual memory you have is an abstract thing, based on the actual physical RAM plus swapfile size plus stuff that's paged into memory from other files and stuff that isn't paged anywhere because the OS is being clever and so on.

According to your comments, each process averages 0.75GB of real memory, and 4GB of virtual memory. So, your total VM usage is 32GB.

One common reason for this is that each process might peak at 4GB, but spend almost all of its time using a lot less than that. Python rarely releases memory to the OS; it'll just get paged out.

Anyway, 6GB of real memory is no problem on an 8GB Mac or a 7GB c1.xlarge instance.

And 32GB of VM is no problem on a Mac. A typical OS X system has virtually unlimited VM size—if you actually try to use all of it, it'll start creating more swap space automatically, paging like mad, and slowing your system to a crawl and/or running out of disk space, but that isn't going to affect you in this case.

But 32GB of VM is likely to be a problem on linux. A typical linux system has fixed-size swap, and doesn't let you push the VM beyond what it can handle. (It has a different trick that avoids creating probably-unnecessary pages in the first place… but once you've created the pages, you have to have room for them.) I'm not sure what an xlarge comes configured for, but the swapon tool will tell you how much swap you've got (and how much you're using).

Anyway, the easy solution is to create and enable an extra 32GB swapfile on your xlarge.

However, a better solution would be to reduce your VM use. Often each subprocess is doing a whole lot of setup work that creates intermediate data that's never needed again; you can use multiprocessing to push that setup into different processes that quit as soon as they're done, freeing up the VM. Or maybe you can find a way to do the processing more lazily, to avoid needing all that intermediate data in the first place.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • Thanks a lot for the detailed response! I had noticed memory use ballooning but haven't been able to optimize the multiprocess to more efficiently share resources or run the computation. I'll ask a separate question showing the key pieces of the code to see if I can get some assistance there. – Stefan Sep 09 '13 at 21:57