0

I have an Azure function written in Python which has a simple purpose: return a prediction for a new observation based on a model I have trained, tested, and stored as a BLOB. I created the model using a Jupyter notebook and uploaded it to Azure BLOB Storage. I can read the model file, but when I try to unpickle it I get an error: Exception: UnpicklingError: invalid load key, '\xef'.

I'm new to ML and Azure functions so I'm not sure where to start. I've tried loading the model locally and it works fine. I've tried downloading the file back from Azure Storage and it works fine.

The PKL file is generated from a notebook like this:

pickle.dump(model, open("diabetes-model.pkl", "wb"))

In my Azure function I'm passing a func.InputStream to a method that looks like this:

def do_prediction(modelFileStream):
    mod  = modelFileStream.read()
    modelFileStream.close()
    model = pickle.loads(mod)

The file starts like this in the debugger (it's almost 400KB):

b'\xef\xbf\xbd\x03cxgboost.sklearn\nXGBClassifier\nq\x00)\xef\xbf\xbdq\x01}q\x02(X\t\x00\x00\x00max_depthq\x03K\x0cX\r\x00\x00\x00learning_rateq\x04G?\xef\xbf\xbdz\xef\xbf\xbdG\xef\xbf\xbd\x14{X\x0c\x00\x00\x00n_estimatorsq\x05M,\x01X\t\x00\x00\x00verbosityq\x06K\x01X\x06\x00\x00\x00silentq\x07NX\t\x00\x00\x00objectiveq\x08X\x0f\x00\x00\x00binary:logisticq\tX\x07\x00\x00\x00boosterq\nX\x06\x00\x00\x00gbtreeq\

The error is: Exception: UnpicklingError: invalid load key, '\xef'.

I'm guessing there is some kind of an encoding issue here. I've seen some guidance that the contents should be Base64 encoded before being written, but that seems inefficient to me.

Would love some guidance on what is going on or what to try next.

Matt Webster
  • 313
  • 4
  • 13

1 Answers1

1

I am assuming that this is something related to the way of opening a file and then loading it in the function.

The process of loading a pickled file back into a Python program is use the **open()** function again, but with 'rb' as second argument (instead of wb). The r stands for read mode and the b stands for binary mode. You'll be reading a binary file. Assign this to infile. Next, use pickle.load(), with infile as argument,

infile = open(filename,'rb')
new_dict = pickle.load(infile)
infile.close()

Please check and see if it helps.

Additional reference:

Saving and loading objects and using pickle

https://www.datacamp.com/community/tutorials/pickle-python-tutorial

Mohit Verma
  • 5,140
  • 2
  • 12
  • 27