0

When I run predict_proba on a dataframe without multiprocessing I get the expected behavior. The code is as follows:

probabilities_data = classname.perform_model_prob_predictions_nc(prediction_model, vectorized_data)

where: perform_model_prob_predictions_nc is:

def perform_model_prob_predictions_nc(model, dataFrame): 
    try:
        return model.predict_proba(dataFrame)
    except AttributeError:
        logging.error("AttributeError occurred",exc_info=True)

But when I try to run the same function using chunks and multiprocessing:

probabilities_data = classname.perform_model_prob_predictions(prediction_model, chunks, cores)

where perform_model_prob_predictions is :

def perform_model_prob_predictions(model, dataFrame, cores=4): 
    try:
        with Pool(processes=cores) as pool:
            result = pool.map(model.predict_proba, dataFrame)
            return result
    except Exception:
        logging.error("Error occurred", exc_info=True)

I get the following error:

PicklingError: Can't pickle <function OneVsRestClassifier.predict_proba at 0x14b1d9730>: it's not the same object as sklearn.multiclass.OneVsRestClassifier.predict_proba

As reference:

cores = 4
vectorized_data = pd.DataFrame(...)
chunk_size = len(vectorized_data) // cores + cores
chunks = [df_chunk for g, df_chunk in vectorized_data.groupby(np.arange(len(vectorized_data)) // chunk_size)]
YihanBao
  • 501
  • 1
  • 4
  • 15

1 Answers1

1

Pool internally uses Queue and anything that goes there needs to be pickled. The error tells you that PicklingError: Can't pickle <function OneVsRestClassifier.predict_proba cannot be pickled.

You have several options, some are described in this SO post. Another option is to use joblib with loky backend. The latter uses cloudpickle that allows for serialisation of constructs not supported by default pickle.

The code will look more or less like this:

from joblib import Parallel, delayed

Parallel(n_jobs=4, backend='loky')(delayed(model.predict_proba)(dataFrame=dataFrame) for chunk in chunks)

Mind that classic pickling such methods on objects is in general not healthy idea. dill could work here well.

Lukasz Tracewski
  • 10,794
  • 3
  • 34
  • 53