Python multiprocessing stops with memory maxed out

Question

I'm running this code:

    # Get big (0.5GB) list of data
    all_ftrajs = get_feature_trajs(traj_top_paths, hp_dict)
    # Function I want to bootstrap
    bs_func = partial(bs_func, all_ftrajs=all_ftrajs)

    rng = get_rng(seed)
    n_workers = min(n_cores, bs_samples)
    results = []
    if n_workers > 1:
        with Pool(n_workers) as pool:

            for i in range(bs_samples):
                # bootstrap list indices 
                _, bs_ix = sample_trajectories(all_ftrajs, rng, bs_samples > 1)
                
                # accumulate results
                results.append(pool.apply_async(func=bs_func,
                                                args=(hp_dict, bs_ix, seed,
                                                      bs_dir.joinpath(f"{i}.pkl"), hp_idx),
                                                kwds=kwargs))
            # Get results
            for r in results:
                r.get()

            # close off pool
            pool.close()
            pool.join()

This particular code bootstraps some analysis 100 times (bs_samples = 100) by analysing a list of numpy arrays - all_ftrajs - after bootstrapping its elements:

def bs_func(hp_dict: Dict[str, List[Union[str, int]]],
             bs_ix: np.ndarray, seed: Union[int, None],
             out_dir: Path, hp_idx: int,
             lags: List[int], all_ftrajs: List[np.ndarray]):
    # Bootstrap the list of numpy arrays
    feat_trajs = [all_ftrajs[i] for i in bs_ix]

    # do the analysis
    try:
        tica, kmeans = discretize_trajectories(hp_dict, feat_trajs, seed)
        disc_trajs = kmeans.dtrajs
        mods_by_lag = estimate_msms(disc_trajs, lags)
        outputs = score_msms(mods_by_lag)
        outputs.ix = hp_idx
        write_outputs(outputs, out_dir)
    except Exception as e:
        logging.info(e)
    return True

I do this bootstrapping on a series of different experiments (trials). After a handful of trials the program freezes:

all processes are still active. I have 12 logical cores on my machine and I'm using a pool of size 2.
swap memory and RAM is maxed out (2/16GB respectively).
all cpu usage is 0.0%

when running this in serial - the single process uses no more than 10% memory according to htop.

I have tried:

passing the big data list as a parameter to bs_func. I use the partial construction inspired by the answer to this question: Shared-memory objects in multiprocessing
running it in serial (this works)

I'm running python 3.8 and I'm on ubuntu 20.04.4 LTS, 64bit. I've got 15.5GiB memory, and I'm using AMD Ryzen 5 3600 6-core processor x 12 (this is copied from my 'about' section)

Questions:

Is there an obvious fix to my code to make it work?
Are there any other solutions that don't require too much refactoring of code?

Thanks and best wishes,

Rob

In python multiprocessing it dupe memory for each subproces (if you have a list of 100 MB on main process, each subprocess also will have a list of 100MB => 2 subprocess = 300MB usage). Threading dont dupe memory — Wonka, Apr 07 '22 at 09:27
Thanks - I've edited the original post to say that the memory usage in serial is 10%. So running two processes should be fine. — Robert Edward Arbon, Apr 07 '22 at 18:02

Python multiprocessing stops with memory maxed out

0 Answers0