-1

I created and saved a neural network on a script called "training_net.py". As recommended in the sklearn website (http://scikit-learn.org/stable/modules/neural_networks_supervised.html#tips-on-practical-use) I scaled the training set and used the same scaler for the test set.

Now I have a script, called "prediction.py", that takes as input a vector of parameters and the neural network created in "training_net.py" and gives a classification as output.

My doubt is about the scaling of my input in "prediction.py". I guess I should scale the input with the same transformation I used in "training_net.py", but I don't understand how to get the transformation parameters from the scaler used.

When I do scaler.get_params() I just get the following information: {'copy': True, 'with_mean': True, 'with_std': True}

Here is a small code extract to show better what I mean.

training_net.py

#scale training and test data
scaler = StandardScaler()
scaler.fit(training_data)

training_data = scaler.transform(training_data)
test_data = scaler.transform(test_data)

clf.fit(training_data, training_label)
nn_name = "NN.pkl"
joblib.dump(clf, nn_name)

clf = joblib.load(nn_name)
print clf.score(test_data, test_label)

prediction.py

model_name = "NN.pkl"
clf = joblib.load(model_name)

#need to scale input_parameters before predicting!
#?

print clf.predict(input_parameters)
Trillian
  • 21
  • 6
  • Why not pickle the StandardScaler too? – Vivek Kumar Jul 31 '17 at 13:31
  • Also, what you need are the attributes, not parameters. See the [documentation](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html) for available attributes which will be learnt from the data. – Vivek Kumar Jul 31 '17 at 13:33
  • That was my problem, how to pickle it. Anyway, since I didn't know what pickle meant, I made a search with the keywords 'pickle standardscaler' and found the answer in Stackoverflow itself. Thank you! And you are right, I need attributes, not parameters, thanks for the clarification. Here the answer: https://stackoverflow.com/questions/35944783/how-to-store-scaling-parameters-for-later-use – Trillian Jul 31 '17 at 14:16
  • In your code, you are already pickling the `clf` object using `joblib.dump`. Use the same technique to save the CountVectorizer. – Vivek Kumar Jul 31 '17 at 14:18
  • Actually that answer is exactly what you said. Printing the available attributes. Could you write your comment as answer? Then I can confirm it and close the question. – Trillian Jul 31 '17 at 14:20
  • ah! True. I didn't know I could use it for anything. Thanks – Trillian Jul 31 '17 at 14:23
  • Glad could help. And its even [recommended by the scikit documentation](http://scikit-learn.org/stable/modules/model_persistence.html) to use joblib instead of simple pickle to save scikit-learn objects. – Vivek Kumar Jul 31 '17 at 14:25

1 Answers1

0

As Vivek Kumar answered in the comment, it is enough to pickle the Standard scaler.

joblib.dump(scaler, "scaler.pkl")

Here you can also find a similar question (and answer) I couldn't find in my first search: How to store scaling parameters for later use

Trillian
  • 21
  • 6