I am currently trying to use the AutoMLStep to train a machine learning model, register it in the workspace, and use it for inference as a deserialized model. My current project folder/file structure is the following:
project/
│
├── src/
│
├──data_prep.py
├──register_model.py
├── pipeline.py
(mostly basing my work on https://learn.microsoft.com/en-us/azure/machine-learning/v1/how-to-use-automlstep-in-pipelines) In the pipeline.py script, I create my pipeline PythonScriptStep objects (2 in this case) as well as the AutoMLStep. The AutoMLStep is defined as follow:
train_step = AutoMLStep(name='AutoML_Classification',
automl_config=automl_config,
passthru_automl_config=False,
outputs=[metrics_data, model_data],
allow_reuse=True)
Where:
metrics_data = PipelineData(name='metrics_data',
datastore=blobstore,
pipeline_output_name=metrics_output_name,
training_output=TrainingOutput(type='Metrics'))
model_data = PipelineData(name='model_data',
datastore=blobstore,
pipeline_output_name='best_model_ticketing',
training_output=TrainingOutput(type='Model'))
For the register_model.py script, which is the last step in my pipeline sequence, I want to register the model, and use it to make predictions. I've tried the following:
from azureml.core.model import Model, Dataset
from azureml.core.run import Run, _OfflineRun
from azureml.core import Workspace
import argparse
import os
import pickle
from azureml.pipeline.core import PipelineRun
from azureml.pipeline.steps.automl_step import AutoMLStepRun
parser = argparse.ArgumentParser()
parser.add_argument("--model_name", required=True)
parser.add_argument("--model_path", required=True)
args = parser.parse_args()
run = Run.get_context()
ws = Workspace.from_config() if type(run) == _OfflineRun else run.experiment.workspace
pipeline_run_id = run.parent.id
pipeline_run = PipelineRun(experiment=run.experiment, run_id=pipeline_run_id) # This is the Pipeline run, that orchestrates the overall pipeline
best_model_output = pipeline_run.get_pipeline_output('best_model_ticketing')
num_file_downloaded = best_model_output.download('.', show_progress=True)
model_filename = best_model_output._path_on_datastore
with open(model_filename, "rb" ) as f:
best_model = pickle.load(f)
file_name = f"../outputs/model/{args.model_name}.pkl"
os.makedirs(os.path.dirname(file_name), exist_ok=True)
pickle.dump(value = best_model, filename = file_name)
print("Pickeling of model complete")
# Register model in AzureML
model = Model.register(model_path = file_name,
model_name = args.model_name,
description = "Model, with Hyperparameters Tuned",
workspace = ws)
Which leads to
Traceback (most recent call last):
File "src/register_model.py", line 26, in <module>
best_model = pickle.load(f)
EOFError: Ran out of input
Ideally, to integrate this with my current project script, I'd like to use a similar approach to this:
# Begin pickling the model
# non AutoML training done prior to this to create best_xgb_model in same script
print("Begin pickling the model")
model_name = args.registered_model_name
# save model in ./model
print("Exporting model as a .pkl")
import os
file_name = f"../outputs/model/{model_name}.pkl"
os.makedirs(os.path.dirname(file_name), exist_ok=True)
joblib.dump(value = best_xgb_model, filename = file_name)
print("Pickeling of model complete")
# Register model in AzureML
print("Registering Model with AzureML")
model = Model.register(
model_path = file_name,
model_name = model_name,
description = "Model, with Hyperparameters Tuned",
workspace = ws
)
Which allows the model to be used this way:
model_path = Model.get_model_path(model_name = args.registered_model_name, _workspace=ws) # get path of *latest* model
# Deserialize the model file back into xgb model
best_xgb_model = joblib.load(model_path)
Bottom line of all this is how can I retrieve the AutoMLStep best fitted model in the following step(register_model.py), in such a way that I can use a joblib.dump, register the model, and load for predictions. I've tried registering the model directly (doesnt save the model as .pkl file) and wasn't able to use for inference with the get_model_path.
Help would be greatly appreciated.