1

How should I interpret : Sensitivity too low where as AUC very high in caret train crossvalidation resampling results on the data I have trained.

Is the model performance bad ?

user6422220
  • 21
  • 1
  • 7
  • You should check this question on reproducible code and provide additional details in order to enable more users to help you out. Here's the link: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Pj_ Aug 25 '16 at 17:20
  • Its a theoretical question! I don't think it is related to any specific dataset.My question is very general which says how should I interpret when AUC is high and sensitivity is low in crossvallidation resample results – user6422220 Aug 25 '16 at 17:51
  • That may seem to be obvious, but adding a reproducible code helps adding new dimensions to the information you seek. It could also tell us if in case your question is better suited for cross validated: http://stats.stackexchange.com/ :) – Pj_ Aug 25 '16 at 17:56
  • Ok I agree with you.I will try to make a reproducible code for it :) – user6422220 Aug 25 '16 at 17:57

1 Answers1

0

It usually occurs when there is a class imbalance and the default 50% probability cutoff produces poor predictions but the class probabilities, while poorly calibrated, do well at separating classes well.

Here is an example:

library(caret)

set.seed(1)
dat <- twoClassSim(500, intercept = 10)

set.seed(2)
mod <- train(Class ~ ., data = dat, method = "svmRadial",
             tuneLength = 10,
             preProc = c("center", "scale"),
             metric = "ROC",
             trControl = trainControl(search = "random",
                                      classProbs = TRUE, 
                                      summaryFunction = twoClassSummary))

The results are

> mod
Support Vector Machines with Radial Basis Function Kernel 

500 samples
 15 predictor
  2 classes: 'Class1', 'Class2' 

Pre-processing: centered (15), scaled (15) 
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 500, 500, 500, 500, 500, 500, ... 
Resampling results across tuning parameters:

  sigma       C             ROC        Sens        Spec     
  0.01124608   21.27349102  0.9615725  0.33389177  0.9910125
  0.01330079  419.19384543  0.9579240  0.34620779  0.9914320
  0.01942163   85.16782989  0.9535367  0.33211255  0.9920583
  0.02168484  632.31603140  0.9516538  0.33065224  0.9911863
  0.02395674   89.03035078  0.9497636  0.32504906  0.9909382
  0.03988581    3.58620979  0.9392330  0.25279365  0.9920611
  0.04204420  699.55658836  0.9356568  0.23920635  0.9931667
  0.05263619    0.06127242  0.9265497  0.28134921  0.9839818
  0.05364313   34.57839446  0.9264506  0.19560317  0.9934489
  0.08838604   47.84104078  0.9029791  0.06296825  0.9955034

ROC was used to select the optimal model using  the largest value.
The final values used for the model were sigma = 0.01124608 and C = 21.27349.
topepo
  • 13,534
  • 3
  • 39
  • 52
  • that means if sensitivity /specificity is calculated based on default 50% probability cutoff,then its meaningless to see the model performance by sensitivity/specificity and its good to see through AUC.What do you say?? Also sometimes I see that the logloss is negative .When does that happen ? – user6422220 Sep 03 '16 at 07:16