I am using the Python multiprocessing module and am looking for a way to attach read only data once when the process is constructed. I want this data to persist across multiple jobs.
I planned to subclass Process and attach data to the class, something like this:
import multiprocessing
class Worker(multiprocessing.Process):
_lotsofdata = LotsOfDataHolder()
def run(self, arg):
do something with _lotsofdata
return value
if __name__ == '__main__':
jobs = []
for i in range(5):
p = Worker()
jobs.append(p)
p.start()
for j in jobs:
j.join()
However, the number of jobs is on the order of 500k so I would rather use the Pool construct and I don't see a way to tell Pool to use a subclass of process.
Is there a way to tell Pool to use a subclass of Process or is there another way to persist data on a worker for multiple jobs that works with Pool?
Note: There is along explanation here, but subclassing process was not specifically discussed.
*I see now that the args are passed to the process constructor. This makes my approach all the more unlikely.