Saving StandardScaler() model for use on new datasets

Question

How do I save the StandardScaler() model in Sklearn? I need to make a model operational and don't want to load training data agian and again for StandardScaler to learn and then apply on new data on which I want to make predictions.

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

#standardizing after splitting
X_train, X_test, y_train, y_test = train_test_split(data, target)
sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.transform(X_test)

Related question: https://stackoverflow.com/questions/41993565/save-minmaxscaler-model-in-sklearn — Stephen, Aug 25 '22 at 18:27

score 34 · Accepted Answer · edited Apr 29 '20 at 12:57

you could use joblib dump function to save the standard scaler model. Here's a complete example for reference.

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

data, target = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(data, target)

sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)

if you want to save the sc standardscaller use the following

from sklearn.externals.joblib import dump, load
dump(sc, 'std_scaler.bin', compress=True)

this will create the file std_scaler.bin and save the sklearn model.

To read the model later use load

sc=load('std_scaler.bin')

Note: sklearn.externals.joblib is deprecated. Install and use the pure joblib instead

How to predict after loading the model data – Amarnath Reddy Surapureddy Nov 07 '22 at 08:38 — Amarnath Reddy Surapureddy, Nov 07 '22 at 08:38
use sc.transform(X) to apply the scaling to new data. – Frank_Coumans Apr 14 '23 at 08:31 — Frank_Coumans, Apr 14 '23 at 08:31

score 19 · Answer 2 · edited Apr 14 '23 at 12:54

19

Or if you like to pickle:

import pickle
with open('file/path/scaler.pkl','wb') as f:
    pickle.dump(sc, f)
with open('file/path/scaler.pkl','rb') as f:
    sc = pickle.load(f)

edited Apr 14 '23 at 12:54

Frank_Coumans

173
1
11

answered Dec 03 '19 at 20:33

Kevin Mc

477
4
14

2

This should be the accepted answer. Although, I would prefer using `with open()..` instead of relying the gc to close the file. – Niko Föhr Apr 04 '21 at 10:41

score -1 · Answer 3 · answered Dec 21 '22 at 09:31

You can simply remember mean_ and scale_.

So after you fit (compute mean and scale) your StandardScaler, print out mean and scale.

scaler = StandardScaler()
X = scaler.fit_transform(X)
print("Scaler mean: ", scaler.mean_)
print("Scaler scale: ", scaler.scale_)

In my example the output looks like this: Scaler mean: [ 9.52058421e-01 -6.98286619e-03 -4.14269899e-01 -1.40126971e-01 -8.17856250e+00 5.50322867e+01] Scaler scale: [ 0.6635306 0.29163553 0.65517668 23.05331473 36.66616542 43.53057184]

When you need you Scaler again, i.e for predicting (scaler1 is the new scaler to be sure not to use the old one):

scaler1 = StandardScaler()
scaler1.mean_ = np.array([ 9.52058421e-01, -6.98286619e-03, -4.14269899e-01, -1.40126971e-01, -8.17856250e+00, 5.50322867e+01])
scaler1.scale_ = np.array([ 0.6635306, 0.29163553, 0.65517668, 23.05331473, 36.66616542, 43.53057184]) 

# then use it to transform your data
X = scaler1.transform(X)

In my test the results where the same. Note: don't forget to set the commas in np.array([ ... , ...])

Cheers!

Please don't code this way and recommend it to others. It is a hacky solution that invites mistakes. — AlexK, Dec 23 '22 at 02:37
In Python, when a member variable contains an underscore, that means "don't touch it". — interoception, Aug 01 '23 at 20:29

Saving StandardScaler() model for use on new datasets

3 Answers3

Linked

Related