33

I am trying to re-create the prediction of a trained model but I don't know how to save a model. For example, I want to save the trained Gaussian processing regressor model and recreate the prediction after I trained the model. The package I used to train model is scikit-learn.

kernel = DotProduct() + WhiteKernel()
gpr = GaussianProcessRegressor(kernel=kernel,random_state=0)
gpr.fit(X,y)
sentence
  • 8,213
  • 4
  • 31
  • 40
Long
  • 343
  • 1
  • 3
  • 5
  • 2
    Read this documentation https://scikit-learn.org/stable/modules/model_persistence.html – ashish14 May 13 '19 at 08:04
  • Is any of the answers shared is acceptable? If OP had found a better solution I'm (honestly) really interested to learn one.. | coz I'm exploring options beyond scikit, [here](https://mathematica.stackexchange.com/questions/56319/how-can-i-export-my-learned-classiferfunction-and-predictorfunctions/245950#245950) is an entry that I've worked with.. – p._phidot_ Jun 27 '21 at 14:28

2 Answers2

58

You can use:

1. pickle

from sklearn import svm
from sklearn import datasets

iris = datasets.load_iris()
X, y = iris.data, iris.target

clf = svm.SVC()
clf.fit(X, y)  

##########################
# SAVE-LOAD using pickle #
##########################
import pickle

# save
with open('model.pkl','wb') as f:
    pickle.dump(clf,f)

# load
with open('model.pkl', 'rb') as f:
    clf2 = pickle.load(f)

clf2.predict(X[0:1])

2. joblib

From scikit-learn documentation:

In the specific case of scikit-learn, it may be better to use joblib’s replacement of pickle (dump & load), which is more efficient on objects that carry large numpy arrays internally as is often the case for fitted scikit-learn estimators, but can only pickle to the disk and not to a string:

from sklearn import svm
from sklearn import datasets

iris = datasets.load_iris()
X, y = iris.data, iris.target

clf = svm.SVC()
clf.fit(X, y)  

##########################
# SAVE-LOAD using joblib #
##########################
import joblib

# save
joblib.dump(clf, "model.pkl") 

# load
clf2 = joblib.load("model.pkl")

clf2.predict(X[0:1])
sentence
  • 8,213
  • 4
  • 31
  • 40
  • 1
    One more thing, you can add `compress=3` while `joblib.dump` this will result in a smaller file. You can check the example [here](https://mljar.com/blog/save-load-random-forest/) the compression decreases the file size ~5 times. – pplonski Jun 24 '20 at 18:28
9

You can save and load the model using the pickle operation to serialize your machine learning algorithms and save the serialized format to a file.

import pickle
# save the model to disk
filename = 'gpr_model.sav'
pickle.dump(gpr, open(filename, 'wb')) 

# load the model from disk
loaded_model = pickle.load(open(filename, 'rb'))

Hope it helps!

source

Eric
  • 1,108
  • 3
  • 11
  • 25