I've got a large dict
-like object that needs to be shared between a number of worker processes. Each worker reads a random subset of the information in the object and does some computation with it. I'd like to avoid copying the large object as my machine quickly runs out of memory.
I was playing with the code for this SO question and I modified it a bit to use a fixed-size process pool, which is better suited to my use case. This however seems to break it.
from multiprocessing import Process, Pool
from multiprocessing.managers import BaseManager
class numeri(object):
def __init__(self):
self.nl = []
def getLen(self):
return len(self.nl)
def stampa(self):
print self.nl
def appendi(self, x):
self.nl.append(x)
def svuota(self):
for i in range(len(self.nl)):
del self.nl[0]
class numManager(BaseManager):
pass
def produce(listaNumeri):
print 'producing', id(listaNumeri)
return id(listaNumeri)
def main():
numManager.register('numeri', numeri, exposed=['getLen', 'appendi',
'svuota', 'stampa'])
mymanager = numManager()
mymanager.start()
listaNumeri = mymanager.numeri()
print id(listaNumeri)
print '------------ Process'
for i in range(5):
producer = Process(target=produce, args=(listaNumeri,))
producer.start()
producer.join()
print '--------------- Pool'
pool = Pool(processes=1)
for i in range(5):
pool.apply_async(produce, args=(listaNumeri,)).get()
if __name__ == '__main__':
main()
The output is
4315705168
------------ Process
producing 4315705168
producing 4315705168
producing 4315705168
producing 4315705168
producing 4315705168
--------------- Pool
producing 4299771152
producing 4315861712
producing 4299771152
producing 4315861712
producing 4299771152
As you can see, in the first case all worker processes get the same object (by id). In the second case, the id is not the same. Does that mean the object is being copied?
P.S. I don't think that matters, but I am using joblib
, which internally used a Pool
:
from joblib import delayed, Parallel
print '------------- Joblib'
Parallel(n_jobs=4)(delayed(produce)(listaNumeri) for i in range(5))
which outputs:
------------- Joblib
producing 4315862096
producing 4315862288
producing 4315862480
producing 4315862672
producing 4315862352