How to dump and utilize multiple ML algorithm objects in one single pickle file in Azure ML workspace?

Question

I am trying to create a ML model in Azure ML Workspace using Jupyter notebook. I am not using AutoML feature or Designer provided by Azure, and want to run the complete code prepared locally.

There are 3 different algorithms used in my ML Model. I am confused how can I save all the objects in one single pickle file, which I can utilize later in "Inference configuration" and "Score.py" file? Also, once saved how can I access them in "Score.py" file (which is the main file that contains the driver code)?

Currently I am using following method:

import pickle
f= 'prediction.pkl'
all_models=[Error_Message_countvector, ErrorMessage_tfidf_fit, model_naive]
with open(f, 'wb') as files:
    pickle.dump(all_models, files)

and to access the objects:

cv_output = loaded_model[0].transform(input_series)
tfidf_output = loaded_model[1].transform(cv_output)
loaded_model_prediction = loaded_model[2].predict(tfidf_output)

Somehow, this method works fine when I run in the same cell as the entire code. But it throws error when I deploy the complete model.

My "Score.py" file looks something like this:

import json
from azureml.core.model import Model
import joblib
import pandas as pd

def init():
    global prediction_model 
    prediction_model_path = Model.get_model_path("prediction")    
    prediction_model = joblib.load(prediction_model_path)     

def run(data):
    try:
        data = json.loads(data)     
        input_string= str(data['errorMsg']).strip()             
        input_series=pd.Series(input_string)            
        cv_output= prediction_model[0].transform(input_series)
        tfidf_output = prediction_model[1].transform(cv_output) 
        result = prediction_model[2].predict(tfidf_output)           
        return {'response' : result }

    except Exception as e:
        error = str(e)
        return {'response' : error }

and the error received on deployment is:

Error:
{
  "code": "AciDeploymentFailed",
  "statusCode": 400,
  "message": "Aci Deployment failed with exception: Error in entry script, AttributeError: module '__main__' has no attribute 'text_cleaning', please run print(service.get_logs()) to get details.",
  "details": [
    {
      "code": "CrashLoopBackOff",
      "message": "Error in entry script, AttributeError: module '__main__' has no attribute 'text_cleaning', please run print(service.get_logs()) to get details."
    }
  ]
}

Can anyone help me understand the issue or figure out if there is something missing/wrong in the code?

What is the right way of saving multiple algorithm objects in one single pickle file?

You can refer to [Saving and loading multiple objects in pickle file?](https://stackoverflow.com/questions/20716812/saving-and-loading-multiple-objects-in-pickle-file) and [Read Pickle File as a Pandas DataFrame](https://datascienceparichay.com/article/read-pickle-file-as-pandas-dataframe/#:~:text=You%20can%20use%20the%20pandas%20read_pickle%20%28%29%20function,file%20storing%20the%20data%20you%20want%20to%20read) — Ecstasy, Nov 11 '21 at 04:43

score 0 · Answer 1 · answered Jan 11 '22 at 16:57

> Can anyone help me understand the issue or figure out if there is something missing/wrong in the code?

From your error message:

"Error in entry script, AttributeError: module 'main' has no attribute 'text_cleaning'...

It seems like one your first cv_output from prediction_model is trying to call a function called text_cleaning which has not been imported by your scoring script.

> What is the right way of saving multiple algorithm objects in one single pickle file?

If you want to persist a sequence of transformations, like the one in your example, the best practice is to use the Pipeline class from sklearn:

https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html

How to dump and utilize multiple ML algorithm objects in one single pickle file in Azure ML workspace?

1 Answers1