2

I'm using the the package randomForest in R to create a model to classify cases into disease (1) or disease free (0):

classify_BV_100t <- randomForest(bv.disease~., data=RF_input_BV_clean, ntree = 100, localImp = TRUE)

print(classify_BV_100t)

Call:
 randomForest(formula = bv.disease ~ ., data = RF_input_BV_clean,      ntree = 100, localImp = TRUE) 
           Type of random forest: classification
                 Number of trees: 100
No. of variables tried at each split: 53

    OOB estimate of  error rate: 8.04%
Confusion matrix:
    0  1 class.error
0 510  7  0.01353965
1  39 16  0.70909091

My confusion matrix shows that the model is good at classifying 0 (no disease), but is very bad as classifying 1 (disease).

But when I plot ROC plots it gives the impression that the model is pretty good.

Here are the 2 different ways I plotted ROC:

  1. (Using https://stats.stackexchange.com/questions/188616/how-can-we-calculate-roc-auc-for-classification-algorithm-such-as-random-forest)

    library(pROC)
    rf.roc<-roc(RF_input_BV_clean$bv.disease, classify_BV_100t$votes[,2])
    plot(rf.roc)
    auc(rf.roc)
    
  2. (Using How to compute ROC and AUC under ROC after training using caret in R?)

    library(ROCR)
    predictions <- as.vector(classify_BV_100t$votes[,2])
    pred <- prediction(predictions, RF_input_BV_clean$bv.disease)
    
    perf_AUC <- performance(pred,"auc") #Calculate the AUC value
    AUC <- perf_AUC@y.values[[1]]
    
    perf_ROC <- performance(pred,"tpr","fpr") #plot the actual ROC curve
    plot(perf_ROC, main="ROC plot")
    text(0.5,0.5,paste("AUC = ",format(AUC, digits=5, scientific=FALSE)))
    

These are the ROC plots from 1 and 2:

ROC plot 1

ROC plot 2

Both methods give me an AUC of 0.8621593.

Does anyone know why the results from the random forest confusion matrix don't seem to add up with the ROC/AUC?

Alicia
  • 57
  • 1
  • 9

2 Answers2

6

I don't believe that there is anything wrong with your ROC plots and your assessment of the discrepancy is right on.

The high AUC values are a product of there being a very high true negative rate. The ROC takes into account sensitivity; largely a measure of of the true positive values and specificity; a measure of the true negative values. Because your specificity is very high that metric is effectively carrying the lower sensitivity value of the model and this keeps your AUC relatively high. Yes, its a high AUC but as you mentioned, the model is only good at predicting negatives.

I'd recommend calculating additional metrics (sensitivity, specificity, true positive rate, false positive rate... ) and evaluating the combination of all those metrics as you assess your model. AUC is a quality metric, but it means a lot more with additional metrics behind it.

carverd
  • 196
  • 1
  • 9
1

To add to @DanCarver's answer, you can also change the cutoff probability at which you predict an outcome as 0 or 1. By default, the probability thresholds in randomForest are both 0.5 for a two-class problem. However, if, say, a false negative (an incorrect prediction of 0) is more costly than a false positive (and incorrect prediction of 1), you can use a lower cutoff probability for predicting class 1.

Here's an example using the BreastCancer data:

library(randomForest)
library(mlbench)
data(BreastCancer)
library(caret)

# Limit data frame to complete cases
d = BreastCancer[complete.cases(BreastCancer),]

# Run random forest model
set.seed(10)
m1 = randomForest(Class ~ Bare.nuclei + Marg.adhesion, data=d)
m1

# Generate data frame of predictions
pred = data.frame(predict(m1, type="prob"), 
                  actual=d$Class, 
                  thresh0.5=predict(m1))

# Add prediction if we set probability threshold of 0.3 (instead of 0.5) 
# for classifying a prediction as "malignant"
pred$thresh0.3 = factor(ifelse(pred$malignant > 0.3, "malignant", "benign"))

# Look at confusion matrix for each probability threshold    
confusionMatrix(pred$thresh0.5, pred$actual)
confusionMatrix(pred$thresh0.3, pred$actual)

Below is a portion of the output of the confusionMatrix function. Note that with the lower threshold, we capture more true positives (220 instead of 214), but at the expense of also getting more false positives (28 instead of 20). This might be a good tradeoff if a false negative is more costly than a false positive. This article discusses tuning randomForest models to optimize the probability threshold.

Threshold probability 0.5 for predicting malignant

           Reference
Prediction  benign malignant
  benign       424        25
  malignant     20       214

Threshold probability 0.3 for predicting malignant

           Reference
Prediction  benign malignant
  benign       416        19
  malignant     28       220
eipi10
  • 91,525
  • 24
  • 209
  • 285