6

I am trying to save and load scikit-learn model but facing issues when the save and load are happening on different python versions. Here what I have tried:

  1. Using pickle to save a model in python3 and deserialize in python2.This works for some of the models like LR,SVM but it fails for KNN.

    >>> pickle.load(open("inPy3.pkl", 'rb')) #KNN model
    ValueError: non-string names in Numpy dtype unpickling
    
  2. Also , I tried to serialize and deserialize in json using jsonpickle but getting the following error.

    data = jsonpickle.encode(lr) #lr = logisticRegression Model
    jsonpickle.decode(data)
    AttributeError: 'dict' object has no attribute '__name__'
    

Also, I want to know if there is some utility which I can use to serialize and deserialize scikit-learn model objects to human readable format (json,xml,protobuf etc).

rishabh.bhardwaj
  • 378
  • 4
  • 12
  • I suspect this may be an issue with the pickling protocol you use. https://docs.python.org/3/library/pickle.html#pickle-protocols If you are going to pickle something in python 3 and need to use it in Python 2, use `protocol=2` keyword argument in the `pickle.dump` method, which is the highest protocol understood by pickle in Python 2. – juanpa.arrivillaga Jul 12 '16 at 05:58
  • @juanpa.arrivillaga I tried this but getting the same error. In python3:: pickle.dump(neigh, open("knn_ser_py3.pkl", 'wb'), protocol=2, fix_imports=True) , In python2:: reconstructed = pickle.load(open("knn_ser_py3.pkl", 'rb')) ValueError: non-string names in Numpy dtype unpickling – rishabh.bhardwaj Jul 12 '16 at 06:19

1 Answers1

2

Instead of pickling whole models, you can extract and store their coefficients. Then load coefficients and init models with them.

Related to sklearn upgrade question. Similar approach will be valid for python versions.

Julia Meshcheryakova
  • 3,162
  • 3
  • 22
  • 42