26

I have an object that contains within it two scikit-learn models, an IsolationForest and a RandomForestClassifier, that I would like to pickle and later unpickle and use to produce predictions. Apart from the two models, the object contains a couple of StandardScalers and a couple of Python lists.

Pickling this object using joblib is unproblematic, but when I try to unpickle it later I get the following exception:

Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/home/(...)/python3.5/site-packages/joblib/numpy_pickle.py", line 578, in load
   obj = _unpickle(fobj, filename, mmap_mode)
 File "/home/(...)/python3.5/site-packages/joblib/numpy_pickle.py", line 508, in _unpickle
   obj = unpickler.load()
 File "/usr/lib/python3.5/pickle.py", line 1039, in load
   dispatch[key[0]](self)
KeyError: 0

The same application both pickles and unpickles the object, so the versions of scikit-learn, joblib and other libraries are the same. I'm not sure where to start debugging, given the vague error. Any ideas or pointers?

haroba
  • 2,120
  • 4
  • 22
  • 37

6 Answers6

36

The solution to this was pretty banal: Without being aware of it I was using the version of joblib in sklearn.externals.joblib for the pickling, but a newer version of joblib for unpickling the object. The problem was resolved when I used the newer version of joblib for both tasks.

haroba
  • 2,120
  • 4
  • 22
  • 37
14

With me, happened that I exported the model using from sklearn.externals import joblib and tried to load using import joblib.

Marcos Paulo
  • 317
  • 3
  • 10
8

Mine was interesting. I was working with git-lfs and thus the files had been changed and joblib couldn't open them. So I needed to run git lfs pull to get actual files. So if you are using compatible joblib versions, make sure your files are not changed somehow!

Iman Mirzadeh
  • 12,710
  • 2
  • 40
  • 44
1

For me the same version of joblib was used to dump and load but I was saving the file under python 3.7.4 and attempting to load with python 3.7.6 which raised the same KeyError.

BarefootDev
  • 326
  • 3
  • 9
0

In my case, I was trying to load an XGB. I found out XGB is not compatible with other sklearn models, so I did the following:

from xgboost import *
import joblib

def get_model(model_path):
    if 'xgb' in model_path:
        xgb_model = XGBClassifier()
        xgb_model.load_model(model_path)
        model = xgb_model
    else: 
        model = get_obj(model_path)
    return model 

xbg = get_model('Models/xgb_v1.pkl') # an xgb

tree = model = get_model('Models/dt_v1.pkl') # a decition tree
Dharman
  • 30,962
  • 25
  • 85
  • 135
onofricamila
  • 930
  • 1
  • 11
  • 20
0

I was trying to load years old joblib files, which gave multiple levels of errors, depending on the hack I used to bypass them.

With the increasing versions of joblib, the hacks stopped working and I had to create a conda environment specifically for sklearn-0.23 as such:

conda create -n outdated "scikit-learn<0.23"

Afterwards, I was able to open the files and save them differently. This sometimes this means re-saving the data with non-sklearn joblib files import joblib; sometimes this means using pickle; sometimes this meant using pandas.to_csv.

The solution was specific to the data file being re-saved for posterity.

ExoWanderer
  • 74
  • 2
  • 5