When I run predict_proba
on a dataframe without multiprocessing I get the expected behavior. The code is as follows:
probabilities_data = classname.perform_model_prob_predictions_nc(prediction_model, vectorized_data)
where: perform_model_prob_predictions_nc
is:
def perform_model_prob_predictions_nc(model, dataFrame):
try:
return model.predict_proba(dataFrame)
except AttributeError:
logging.error("AttributeError occurred",exc_info=True)
But when I try to run the same function using chunks and multiprocessing:
probabilities_data = classname.perform_model_prob_predictions(prediction_model, chunks, cores)
where perform_model_prob_predictions
is :
def perform_model_prob_predictions(model, dataFrame, cores=4):
try:
with Pool(processes=cores) as pool:
result = pool.map(model.predict_proba, dataFrame)
return result
except Exception:
logging.error("Error occurred", exc_info=True)
I get the following error:
PicklingError: Can't pickle <function OneVsRestClassifier.predict_proba at 0x14b1d9730>: it's not the same object as sklearn.multiclass.OneVsRestClassifier.predict_proba
As reference:
cores = 4
vectorized_data = pd.DataFrame(...)
chunk_size = len(vectorized_data) // cores + cores
chunks = [df_chunk for g, df_chunk in vectorized_data.groupby(np.arange(len(vectorized_data)) // chunk_size)]