2

I am wondering regarding simple and popular alternative for standard Python 2.7 multiprocessing module. My problem with this module is huge memory consumption as every child process has same amount of memory as parent process. In my case the multi-threading isn't option because every child process does heavy parsing. The OS type is Linux.

Yuri Levinsky
  • 1,515
  • 2
  • 13
  • 26
  • Which OS are you using? How are you measuring the memory consumption? Modern OSes use Copy On Write strategies to share memory so they should not consume the same amount of memory but just the amount of memory which they need. – noxdafox Aug 05 '15 at 11:56
  • The OS type is Linux – Yuri Levinsky Aug 05 '15 at 11:57
  • In this case, as long as you're not changing the library's process spawning strategy, the memory consumption will be minimal. Linux processes are not cloning the entire memory of parent processes but using a COW strategy. – noxdafox Aug 05 '15 at 12:03
  • To add to what @noxdafox says, try using `ps_mem.py` to look at memory consumption, it's better than `top`, `ps`, etc: https://github.com/pixelb/ps_mem/ – foz Aug 05 '15 at 12:06
  • Please refer to the following:http://stackoverflow.com/questions/14749897/python-multiprocessing-memory-usage. This is exactly my problem and they says differently. – Yuri Levinsky Aug 05 '15 at 12:12
  • In that very question the answerer suggests that you use `multiprocessing` before loading huge data to ensure child processes have a small memory footprint. Is it not viable in your code? Why? – Felipe Lema Aug 05 '15 at 12:56
  • It should be more convenient way to do it. By the way transfer data to already opened processes might be problematic from performance point of view. – Yuri Levinsky Aug 05 '15 at 13:26
  • That question is exactly telling what we're telling you. Linux uses COW so it's not duplicating the data for each process but sharing it smartly. If those processes are consuming a lot of memory it's because they probably allocate it for their purpose. If the data the processes are working on is modified a lot, then they will actually consume a similar amount of memory but that data is not shared anymore, it's unique. – noxdafox Aug 06 '15 at 07:42
  • So, what exactly I have to do to minimize the memory consumption? – Yuri Levinsky Aug 06 '15 at 12:58
  • @YuriLevinsky: there are no details in your question. If you don't want to call `fork()` before allocating memory in the parent then how to minimize thee memory consumption and should you worry about it depends on specific case. In *some* cases, you could just [share the data](http://stackoverflow.com/q/7894791/4279) – jfs Aug 06 '15 at 20:57
  • This is good for begging. NumPy is the fundamental package for scientific computing with Python and multi process data share not his main goal. Is it anything more commonly used method to do the same, some popular solutions? – Yuri Levinsky Aug 10 '15 at 07:46

0 Answers0