I have a function that creates a large mask (boolean arrays). I want to call this function several times and create a total mask of the same shape that is True at indices that are True in any of the individual masks.
Since the calculation of the masks takes much time I have parallelized it but the function consumes a lot of memory now because I am first creating all individual masks and then combining them, which means that I have to store all ~40.000 individual masks. Is there a possibility to directly add the returned individual masks to a total mask before calculating the next mask using multiprocessing?
This is an example code for the problem:
import numpy as np
from multiprocessing import Pool
def return_something(seed):
np.random.seed(seed)
return np.random.choice([True, False], size=shape, p=[0.1, 0.9])
shape = (50, 50)
ncores = 4
seeds = np.random.randint(low=0, high=np.iinfo(np.int32).max, size=10)
# Without parallelisation, very slow:
mask = np.zeros(shape, dtype=bool)
for seed in seeds:
mask |= return_something(seed)
# With parallelisation, takes too much memory
p = Pool(ncores)
mask_parallel = np.any(list(p.imap(return_something, seeds)), axis=0)
I think I do not understand the (i)map functions enough. I know multiprocessing.imap returns a generator and it is possible to show for example a progress bar using tqdm with the following code:
list(tqdm.tqdm(p.imap(fct, inputs), total=len(inputs))
Since the progress bar is updated during the multiprocessing run I think it must be possible to already access the results during the run and maybe summing them up but I do not know how.
Thanks for your help!