I am working on a CPU intensive ML problem which is centered around an additive model. Since addition is the main operation I can divide the input data into pieces and spawn multiple models which are then merged by the overriden __add__
method.
The code relating to the multiprocessing looks like this:
def pool_worker(filename, doshuffle):
print(f"Processing file: {filename}")
with open(filename, 'r') as f:
partial = FragmentModel(order=args.order, indata=f, shuffle=doshuffle)
return partial
def generateModel(is_mock=False, save=True):
model = None
with ThreadPool(args.nthreads) as pool:
from functools import partial
partial_models = pool.imap_unordered(partial(pool_worker, doshuffle=is_mock), args.input)
i = 0
for m in partial_models:
logger.info(f'Starting to merge model {i}')
if model is None:
import copy
model = copy.deepcopy(m)
else:
model += m
logger.info(f'Done merging...')
i += 1
return model
The issue is that the memory consumption scales exponentially as the model order increases, so at order 4 each instance of the model is about 4-5 GB, which causes the threadpool to crash as the intermediate model objects are then not pickleable.
I read about this a bit and it appears as even if the pickling is not an issue, it's still extremely inefficient to pass data like this, as commented to this answer.
There is very little guidance as to how one can use shared memory for this purpose, however. Is it possible to avoid this problem without having to change the internals of the model object?