8

I'm training two SVM models using two differnt packages on my data and getting vastly different results. Is this something to be expected?

model1 using e1071

library('e1071')
model1 <- svm(myFormula, data=trainset,type='C',kernel='linear',probability = TRUE)
outTrain <- predict(model1, trainset, probability = TRUE)
outTest <- predict(model1, testset, probability = TRUE)
train_pred <- attr(outTrain, "probabilities")[,2]
test_pred <- attr(outTest, "probabilities")[,2]
calculateAUC(train_pred,trainTarget)
calculateAUC(test_pred,testTarget)

model2 using caret

model2 <- train(myFormula,data=trainset,method='svmLinear')
train_pred <- predict(model2, trainset)
test_pred  <- predict(model2, testset)
calculateAUC(train_pred,trainTarget)
calculateAUC(test_pred,testTarget)

calculateAUC() is a function I defined to calculate the AUC value, given the predicted and the actual values of the target. I see the values as:

model1 (e1071)

1
0.8567979

model2 (caret)

0.9910193
0.758201

Is this something that is possible? Or am I doing this wrong?

I can provide sample data if that will be helpful

Richie Cotton
  • 118,240
  • 47
  • 247
  • 360
user2175594
  • 799
  • 3
  • 9
  • 17

3 Answers3

7

Yes, it is possible, due to for example:

  • Different C values, in e1071 default value is 1, maybe caret uses other?
  • Data scaling, e1071 scales your input by default, caret does not scale by default (although kernlab's svm does, and it is an "under the hood" model, so it would require source checking to be sure)
  • different eps/maxiteration or other optimization related threshold

Simply display your models parameters after learning and check whether they are the same, you will probably find some parameter which by default is different between these two libraries.

lejlot
  • 64,777
  • 8
  • 131
  • 164
  • I did print the models and check these: >The C value in both is 1. > My variables are already scaled. The optimization threshold could be different though – user2175594 Sep 20 '13 at 10:59
  • How scaled? `e1071` normalizes dimension-wise to mean=0 and variance=1, not by simple linear squashing to [0,1], so scaling **is** important. Try to manually turn off scaling in both. – lejlot Sep 20 '13 at 11:02
  • `caret` uses `kernlab`. Try printing out `model2$finalModel` to see what the differences are and/or fit the same model in `e1071` and `kernlab` just to check. – topepo Sep 20 '13 at 19:24
  • Also, your `train` code must be more complicated than that. Most recent versions of `caret` do not automatically produce class probabilities and `trainControl` is missing above as well as `predict, type = "prob")`. Also, where did `calculateAUC` come from? You should supply a reproducible example and the results of `sessoinInfo()`. – topepo Sep 20 '13 at 19:26
5

I have observed that kernlab uses rbfkernel as,

rbf(x,y) = exp(-sigma * euclideanNorm(x-y)^2)

but according to this wiki link, the rbf kernel should be

rbf(x,y) = exp(-euclideanNorm(x-y)^2/(2*sigma^2))

which is also more intuitive since two close samples with a large sigma value will lead to a higher similarity matching.

I am not sure what e1071 svm uses (native code libsvm?)

I know this is an old thread, but hope someone can enlighten me on why there is a difference ? A small example for comparison

set.seed(123)
x <- rnorm(3)
y <- rnorm(3)
sigma <- 100

rbf <- rbfdot(sigma=sigma)
rbf(x, y)
exp( -sum((x-y)^2)/(2*sigma^2) )

I would expect the kernel value to be close to 1 (since x,y come from sigma=1, while kernel sigma=100). This is observed only in the second case.

jMathew
  • 1,057
  • 8
  • 13
1

First note that svmLinear relies on kernlab. You can directly use e1071 from caret simply replacing svmLinear argument by svmLinear2 (see the detailed list of models and the library they depend on in the docs).

Now, note that the two libraries produce identical results, provided you pass them the right parameters. I recently benchmarked these methods and noted that passing the following parameters ensures the same results:

model_kernlab <-
  kernlab::ksvm(
      x = X,
      y = Y,
      scaled = TRUE,
      C = 5,
      kernel = "rbfdot",
      kpar = list(sigma = 1),
      type = "eps-svr",
      epsilon = 0.1
      )

model_e1071 <- e1071::svm(x = X,
      y = Y,
      cost = 5,
      scale = TRUE, 
      kernel = "radial",
      gamma = 1,
      type = "eps-regression",
      epsilon = 0.1)

Note the different names : - C / cost - sigma / gamma - eps / epsilon - rbfdot / radial ...

RUser4512
  • 1,050
  • 9
  • 23