10

In the train function of the caret package it is possible to perform centering and scaling of predictors as in the following example:

knnFit <- train(Direction ~ ., data = training, method = "knn",
                preProcess = c("center","scale"))

Setting this transformation in train should give a better evaluation of the performance of the algorithm during resampling.

In this case when I use the model to predict the response for new data should I care about centering and scaling or this operation is included in the final model?

Is the following operation sufficient?

pred <- predict(knnFit, newdata = test)

Thanks!

Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
amarchin
  • 2,044
  • 1
  • 16
  • 32
  • No, previously you should center and scaling. http://stackoverflow.com/questions/15468866/scaling-a-numeric-matrix-in-r-with-values-0-to-1 and http://stackoverflow.com/questions/15215457/standardize-data-columns-in-r – PereG Jan 07 '16 at 12:33

1 Answers1

8

preProces specified in the train object will be applied to the new data without preprocessing the new data first. So your operation is sufficient.

Also have a look at the extract from the caret website below. There is also a whole section purely about preprocessing. Definitely worth your time reading through it.

You can find the caret website here.

These processing steps would be applied during any predictions generated using predict.train, extractPrediction or extractProbs (see details later in this document). The pre-processing would not be applied to predictions that directly use the object$finalModel object.

phiver
  • 23,048
  • 14
  • 44
  • 56