1

I have been using e1071::svm(...,probability=TRUE) in R to fit a binary SVM classifier and then using predict.svm() to get the probabilities for both the training sample and a test sample. When I converted the probabilities to log(odds) and plotted them against the decision.values, I found there was a discontinuity in the predictions:

Plot of log(odds) = log(prob/(1-prob)) vs. Decision Values

This is happening with other models as well, whenever the probability is below about 0.25%; there is consistently a gap from log(odds)= -5.98 to -10.86. Note that this does not occur at a fixed decision.value (which varies with the model). I believe it may also be happening at high probabilities (>99%) as well.

The red and green lines are linear fits for the predictions with log(odds)<-8 and >-8, respectively. The coefficients of the latter agree with the probA and probB outputs returned with the svm object. I have seen other cases where the gap occurs from from +5.98 to +10.86 (only).

Here is the example using the iris dataset:

require("datasets")
require("e1071")
iris$is.setosa <- as.numeric(iris$Species=="setosa")
set.seed(8675309)
fit <- svm(
   is.setosa ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width,    
   data=iris,probability=T,cost=0.01,kernel="linear",type="C-classification")
preds <- predict(fit,prob=TRUE,newdata=iris,decision=T)
DVs <- attr(preds,"decision.values")[,1]
probs <- attr(preds,"probabilities")[,"1"]
logodds <- log(probs/(1-probs))
plot(DVs,logodds,xlab="decision.values",ylab="log(odds)",main="IRIS dataset")
cat("Coefficents of probability model reported by svm():\n")
print(fit[c("probA","probB")])
fit <- lm(logodds ~ DVs,subset=which(logodds> -8))
cat("fit of logodds ~ DVs when log(odds) greater than -8:\n")
print(summary(fit))
abline(fit,col="green",lty=3)
fit <- lm(logodds ~ DVs,subset=which(logodds< -8))
abline(fit,col="red",lty=3)

Has anyone else seen this behavior? Any idea what might be causing it? Thanks!

mwrowe
  • 91
  • 4
  • It would be helpful to include a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data and the code you are using to calculate the values you are plotting. – MrFlick Apr 26 '16 at 22:03
  • Reproducible example added. – mwrowe Apr 27 '16 at 19:10

0 Answers0