I know svm
model needs preprocessing that converts categorical variables into dummy variables. However, when I am using e1071
's svm
function to fit a model with unconverted data (see train
and test
), no error pops up. I am assuming the function automatically converts them.
However, when I am using the converted data (see train2
and test2
) to fit a svm model, this function gives me a different result (as indicated, p1
and p2
are not the same).
Could anyone let me know what happened to the unconverted data? Does the function just ignore the categorical variables, or something else happened?
library(e1071)
library(dummies)
set.seed(0)
x = data.frame(matrix(rnorm(200, 10, 10), ncol = 5)) #fake numerical predictors
cate = factor(sample(LETTERS[1:5], 40, replace=TRUE)) #fake categorical variables
y = rnorm(40, 50, 10) #fake response
data = cbind(y,cate,x)
ind = sample(40, 30, replace=FALSE)
train = data[ind, ]
test = data[-ind, ]
#without dummy
data = cbind(y,cate,x)
svm.model = svm(y~., train)
p1 = predict(svm.model, test)
#with dummy
train2 = cbind(train[,-2], dummy(train[,2]))
colnames(train2) = c('y', paste0('X',1:5), LETTERS[1:4])
test2 = cbind(test[,-2], dummy(test[,2]))
colnames(test2) = c('y', paste0('X',1:5), LETTERS[1:4])
svm.model2 = svm(y~., train2)
p2 = predict(svm.model2, test2)