0

I'm running this code:

    # Get big (0.5GB) list of data
    all_ftrajs = get_feature_trajs(traj_top_paths, hp_dict)
    # Function I want to bootstrap
    bs_func = partial(bs_func, all_ftrajs=all_ftrajs)

    rng = get_rng(seed)
    n_workers = min(n_cores, bs_samples)
    results = []
    if n_workers > 1:
        with Pool(n_workers) as pool:

            for i in range(bs_samples):
                # bootstrap list indices 
                _, bs_ix = sample_trajectories(all_ftrajs, rng, bs_samples > 1)
                
                # accumulate results
                results.append(pool.apply_async(func=bs_func,
                                                args=(hp_dict, bs_ix, seed,
                                                      bs_dir.joinpath(f"{i}.pkl"), hp_idx),
                                                kwds=kwargs))
            # Get results
            for r in results:
                r.get()

            # close off pool
            pool.close()
            pool.join()

This particular code bootstraps some analysis 100 times (bs_samples = 100) by analysing a list of numpy arrays - all_ftrajs - after bootstrapping its elements:

def bs_func(hp_dict: Dict[str, List[Union[str, int]]],
             bs_ix: np.ndarray, seed: Union[int, None],
             out_dir: Path, hp_idx: int,
             lags: List[int], all_ftrajs: List[np.ndarray]):
    # Bootstrap the list of numpy arrays
    feat_trajs = [all_ftrajs[i] for i in bs_ix]

    # do the analysis
    try:
        tica, kmeans = discretize_trajectories(hp_dict, feat_trajs, seed)
        disc_trajs = kmeans.dtrajs
        mods_by_lag = estimate_msms(disc_trajs, lags)
        outputs = score_msms(mods_by_lag)
        outputs.ix = hp_idx
        write_outputs(outputs, out_dir)
    except Exception as e:
        logging.info(e)
    return True

I do this bootstrapping on a series of different experiments (trials). After a handful of trials the program freezes:

  • all processes are still active. I have 12 logical cores on my machine and I'm using a pool of size 2.
  • swap memory and RAM is maxed out (2/16GB respectively).
  • all cpu usage is 0.0%

when running this in serial - the single process uses no more than 10% memory according to htop.

I have tried:

  • passing the big data list as a parameter to bs_func. I use the partial construction inspired by the answer to this question: Shared-memory objects in multiprocessing
  • running it in serial (this works)

I'm running python 3.8 and I'm on ubuntu 20.04.4 LTS, 64bit. I've got 15.5GiB memory, and I'm using AMD Ryzen 5 3600 6-core processor x 12 (this is copied from my 'about' section)

Questions:

  • Is there an obvious fix to my code to make it work?
  • Are there any other solutions that don't require too much refactoring of code?

Thanks and best wishes,

Rob

  • In python multiprocessing it dupe memory for each subproces (if you have a list of 100 MB on main process, each subprocess also will have a list of 100MB => 2 subprocess = 300MB usage). Threading dont dupe memory – Wonka Apr 07 '22 at 09:27
  • Thanks - I've edited the original post to say that the memory usage in serial is 10%. So running two processes should be fine. – Robert Edward Arbon Apr 07 '22 at 18:02

0 Answers0