I have several processes each completing tasks which require a single large numpy array, this is only being read (the threads are searching it for appropriate values).
If each process loads the data I receive a memory error.
I am therefore trying to minimise the memory usage by using a Manager to share the same array between the processes.
However I still receive a memory error. I can load the array once in the main process however the moment I try to make it an attribute of the manager namespace I receive a memory error. I assumed the Managers acted like pointers and allowed seperate processes (which normally only have access to their own memory) to have access to this shared memory as well. However the error mentions pickling:
Traceback (most recent call last):
File <PATH>, line 63, in <module>
ns.pp = something
File "C:\Program Files (x86)\Python35-32\lib\multiprocessing\managers.py", line 1021, in __setattr__
return callmethod('__setattr__', (key, value))
File "C:\Program Files (x86)\Python35-32\lib\multiprocessing\managers.py", line 716, in _callmethod
conn.send((self._id, methodname, args, kwds))
File "C:\Program Files (x86)\Python35-32\lib\multiprocessing\connection.py", line 206, in send
self._send_bytes(ForkingPickler.dumps(obj))
File "C:\Program Files (x86)\Python35-32\lib\multiprocessing\reduction.py", line 50, in dumps
cls(buf, protocol).dump(obj)
MemoryError
I assume the numpy array is actually being copied when assigned to the manager, but I may be wrong.
To make matters a little more irritating I am on a machine with 32GB of memory and watching the memory usage it only increases a little berfore crashing, maybe by 5%-10% at most.
Could someone explain why making the array an attribute of the namespace takes up even more memory? and why my program won't use some of the spare memory avaliable? (I have already read the namespace and manager docs as well as these managers and namespace threads on SO.
I am running Windows Server 2012 R2 and Python 3.5.2 32bit.
Here is some code demonstrating my problem (you will need to use an alternative file to large.txt
, this file is ~75MB of tab delimited strings):
import multiprocessing
import numpy as np
if __name__ == '__main__':
# load Price Paid Data and assign to manager
mgr = multiprocessing.Manager()
ns = mgr.Namespace()
ns.data = np.genfromtxt('large.txt')
# Alternative proving this work for smaller objects
# ns.data = 'Test PP data'