Discrepencies in memorary useage by python multiprocessing in MacOSX and Linux implementations

Question

I have written a bioinformatics python program which makes heavy use of python's multiprocessing package. I see discrepancies between the memory used by child processes when run on MacOSX and Linux systems. MacOSX uses less memory.

When I profile the memory of the child processes running on each system I see a pronounced difference across the platforms. I profile each process when it begins and ends as follows (based on this SO answer, Note: MacOSX reports the memory useage of the process as Bytes and Linux reports as Kilobytes):

resource.getrusage(resource.RUSAGE_SELF).ru_maxrss

Linux reports that each Process requires 1GB whereas MacOSX reports that each job takes roughly 300MB. Whatsmore, MacOSX seems to start small and grows over the course of the Process whereas Linux starts and stays around 1GB.

So my questions:

Does this have something to do with the way either platform handles forking? Perhaps MacOSX spawns a new process whereas Linux forks by default. I am using Python 2.7 so I can't control the start method of processes (I think).

Am I right in thinking that this is a forking issue? Has anyone else come across this problem? How can I control the memory usage in Linux?

The nature of the application and what it does might be revelant for this question: maybe it is that native C datastructures (if any) work differently. The best would be that you submit a sample code people can run and see the difference themselves. — Mikko Ohtamaa, Jul 23 '14 at 14:29
The code is run as part of a wider framework which is not ready for release and one would need hithroughput genome sequencing data to recreate the problem. It uses a package called pysam which has a c to python interface so your native C suggestion is an interesting one. What do you think about the forking? — jhrf, Jul 23 '14 at 15:16
It is not possible to say anything about potential causes. Please try to isolate the problem so that other can repeat it with public code. — Mikko Ohtamaa, Jul 24 '14 at 00:47

Discrepencies in memorary useage by python multiprocessing in MacOSX and Linux implementations

0 Answers0