When I train to models, one using classProbs=TRUE and the other without computing the probabilities, I get different results:
library(caret)
set.seed(7)
myControl <- trainControl(method='cv',number=3,savePredictions=TRUE)
set.seed(7)
model <- train(Species~., iris, tuneLength=4,method='svmRadial', trControl=myControl)
set.seed(7)
myControl <- trainControl(method='cv',number=3,savePredictions=TRUE,classProbs=TRUE)
set.seed(7)
modelProbs <- train(Species~., iris, tuneLength=4, method='svmRadial', trControl=myControl)
The model with classProbs=FALSE (var model) is:
C Accuracy Kappa Accuracy SD Kappa SD
0.25 0.9266667 0.8900100 0.03055050 0.04584410
0.50 0.9333333 0.8999840 0.02309401 0.03469643
1.00 0.9400000 0.9099880 0.02000000 0.03004800
2.00 0.9400000 0.9100059 0.02000000 0.02999416
And the model with classProbs=TRUE (var modelProb) is:
C Accuracy Kappa Accuracy SD Kappa SD
0.25 0.9266667 0.890010 0.03055050 0.04584410
0.50 0.9333333 0.899984 0.02309401 0.03469643
1.00 0.9400000 0.909988 0.02000000 0.03004800
2.00 0.9466667 0.919980 0.02309401 0.03466529
resulting even in different final models after parameter selection (C=1 when classProbs=FALSE and C=2 when classProbs=TRUE).
I have found out that all predictions are equal for both models except those where the classifier is not very sure of which class to predict. For example:
> model$pred[423,]
pred obs rowIndex sigma C Resample
423 versicolor versicolor 69 0.8071298 0.25 Fold3
> modelProbs$pred[423,]
pred obs setosa versicolor virginica rowIndex sigma C Resample
423 virginica versicolor 0.03307154 0.4813102 0.4856182 69 0.8071298 0.25 Fold3
In this experiment, differences are very small but I have tried with more complex data, and differences are huge. Can anybody explain how the classProbs attribute affects to predictions? I thought it was used to view the probabilities for each class but that it didn't change the results.