I'm new to SVM and e1071. I found that the results are different every time I run the exact same code.
For example:
data(iris)
library(e1071)
model <- svm(Species ~ ., data = iris[-150,], probability = TRUE)
pred <- predict(model, iris[150,-5], probability = TRUE)
result1 <- as.data.frame(attr(pred, "probabilities"))
model <- svm(Species ~ ., data = iris[-150,], probability = TRUE)
pred <- predict(model, iris[150,-5], probability = TRUE)
result2 <- as.data.frame(attr(pred, "probabilities"))
then I got result1
as:
setosa versicolor virginica
150 0.009704854 0.1903696 0.7999255
and result2
as:
setosa versicolor virginica
150 0.01006306 0.1749947 0.8149423
and the result keeps change every round.
Here I'm using the first 149 rows as a training set and the last row as testing. The probabilities for each classes in result1
and result2
are not exactly the same. I'm guessing there is some process during the prediction that is "random". How is this happening?
I'm aware that the predicted probabilities can be fixed if I set.seed()
with the same number before each call. I'm not "aiming" for a fixed prediction result, but just curious why this happens and what steps it takes to generate the probabilities prediction.
The slight difference doesn't really have a big impact on the iris data, since the last sample would still be predicted as "virginica". But when my data (with two classes A and B) is not that "good", and an unknown sample is predicted to have probability of 0.489 and 0.521 for two times of being class A, it will be confusing.
Thanks!