Is there a way to save the preprocessing objects in scikit-learn?

Question

I am building a neural net with the purpose of make predictions on new data in the future. I first preprocess the training data using sklearn.preprocessing, then train the model, then make some predictions, then close the program. In the future, when new data comes in I have to use the same preprocessing scales to transform the new data before putting it into the model. Currently, I have to load all of the old data, fit the preprocessor, then transform the new data with those preprocessors. Is there a way for me to save the preprocessing objects objects (like sklearn.preprocessing.StandardScaler) so that I can just load the old objects rather than have to remake them?

this is just a python object, you can pickle it as any other python object. — lejlot, Mar 16 '17 at 19:38
You can combine all of your preprocessing and training in a pipeline object and then simply pickle it using joblib (recommended for scikit) — Vivek Kumar, Mar 17 '17 at 02:37

score 3 · Answer 1 · answered Mar 24 '20 at 22:44

I think besides pickle, you can also use joblib to do this. As stated in Scikit-learn's manual 3.4. Model persistence

In the specific case of scikit-learn, it may be better to use joblib’s replacement of pickle (dump & load), which is more efficient on objects that carry large numpy arrays internally as is often the case for fitted scikit-learn estimators, but can only pickle to the disk and not to a string:

from joblib import dump, load
dump(clf, 'filename.joblib')

Later you can load back the pickled model (possibly in another Python process) with:

clf = load('filename.joblib')

Refer to other posts for more information, Saving StandardScaler() model for use on new datasets, Save MinMaxScaler model in sklearn.

score 2 · Accepted Answer · answered Mar 16 '17 at 19:46

As mentioned by lejlot, you can use the library pickle to save the trained network as a file in your hard drive, then you just need to load it to start to make predictions.

Here is an example on how to use pickle to save and load python objects:

import pickle
import numpy as np

npTest_obj = np.asarray([[1,2,3],[6,5,4],[8,7,9]])

strTest_obj = "pickle example XXXX"


if __name__ == "__main__":
    # store object information
    pickle.dump(npTest_obj, open("npObject.p", "wb"))
    pickle.dump(strTest_obj, open("strObject.p", "wb"))

    # read information from file
    str_readObj = pickle.load(open("strObject.p","rb"))
    np_readObj = pickle.load(open("npObject.p","rb"))
    print(str_readObj)
    print(np_readObj)

Is there a way to save the preprocessing objects in scikit-learn?

2 Answers2