0

I'm using parallel function from joblib to parallelize a task. All processes take as input a pandas dataframe. In order to reduce the run-time memory used it is possible to sharing this dataframe? All processes read-only on it. I found a similar solution but for a numpy array and using multiprocessing here: Shared-memory objects in multiprocessing

this is the snippet of the code:

from joblib import Parallel, delayed

def fun(df, cat):

    a = df[ df[ y ] != cat ]
    b = df[ df[ y ] == cat ]
    ...

output = Parallel(n_jobs=-1)(delayed(func())(df, cat) for cat in labels )

df is a pandas dataframe and labels is just a list.

Will
  • 1,619
  • 5
  • 23

1 Answers1

1

I solved passing directly the filter dataframes

output = Parallel(n_jobs=-1)(delayed(func)(df[ df[ target ] == cat ], 
                                           df[ df[ target ] !=  cat ], 
                                            cat) for cat in labels )
Will
  • 1,619
  • 5
  • 23