24

How do I save the StandardScaler() model in Sklearn? I need to make a model operational and don't want to load training data agian and again for StandardScaler to learn and then apply on new data on which I want to make predictions.

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

#standardizing after splitting
X_train, X_test, y_train, y_test = train_test_split(data, target)
sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.transform(X_test)
Abhinav Bajpai
  • 401
  • 1
  • 5
  • 10
  • Related question: https://stackoverflow.com/questions/41993565/save-minmaxscaler-model-in-sklearn – Stephen Aug 25 '22 at 18:27

3 Answers3

34

you could use joblib dump function to save the standard scaler model. Here's a complete example for reference.

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

data, target = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(data, target)

sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)

if you want to save the sc standardscaller use the following

from sklearn.externals.joblib import dump, load
dump(sc, 'std_scaler.bin', compress=True)

this will create the file std_scaler.bin and save the sklearn model.

To read the model later use load

sc=load('std_scaler.bin')

Note: sklearn.externals.joblib is deprecated. Install and use the pure joblib instead

Federico Dorato
  • 710
  • 9
  • 27
sukhbinder
  • 1,001
  • 10
  • 9
19

Or if you like to pickle:

import pickle
with open('file/path/scaler.pkl','wb') as f:
    pickle.dump(sc, f)
with open('file/path/scaler.pkl','rb') as f:
    sc = pickle.load(f)
Frank_Coumans
  • 173
  • 1
  • 11
Kevin Mc
  • 477
  • 4
  • 14
  • 2
    This should be the accepted answer. Although, I would prefer using `with open()..` instead of relying the gc to close the file. – Niko Föhr Apr 04 '21 at 10:41
-1

You can simply remember mean_ and scale_.

  1. So after you fit (compute mean and scale) your StandardScaler, print out mean and scale.
scaler = StandardScaler()
X = scaler.fit_transform(X)
print("Scaler mean: ", scaler.mean_)
print("Scaler scale: ", scaler.scale_)

In my example the output looks like this: Scaler mean: [ 9.52058421e-01 -6.98286619e-03 -4.14269899e-01 -1.40126971e-01 -8.17856250e+00 5.50322867e+01] Scaler scale: [ 0.6635306 0.29163553 0.65517668 23.05331473 36.66616542 43.53057184]

  1. When you need you Scaler again, i.e for predicting (scaler1 is the new scaler to be sure not to use the old one):
scaler1 = StandardScaler()
scaler1.mean_ = np.array([ 9.52058421e-01, -6.98286619e-03, -4.14269899e-01, -1.40126971e-01, -8.17856250e+00, 5.50322867e+01])
scaler1.scale_ = np.array([ 0.6635306, 0.29163553, 0.65517668, 23.05331473, 36.66616542, 43.53057184]) 

# then use it to transform your data
X = scaler1.transform(X)

In my test the results where the same. Note: don't forget to set the commas in np.array([ ... , ...])

Cheers!