Differences between ROC Curve and Confusion Matrix: R Tidymodels, XGBoost

Question

I'm running an XGBoost binary classification model with Training 375 observation and 125 Testing observations and 19 features. Below are my arguments:

Boosted Tree Model Specification (classification)

Main Arguments:
  mtry = 13
  trees = 100
  min_n = 3
  tree_depth = 5
  learn_rate = 1.57515292756891e-09
  loss_reduction = 0.801337205143451
  sample_size = 0.967102140800562

Computational engine: xgboost

The model performs well

But there is no distribution in the class probability .50001 vs .49999

I'm new to using XGBoost, is this an overfitting issue, sample size issue, am I miss specifying the arguments? I feel like there is an obvious issue that I would love to be educated about.

Using R, tidymodels

How did you generate the ROC curve plot? Using Yardstick e.g. https://yardstick.tidymodels.org/reference/roc_curve.html ? — jared_mamrot, Jan 27 '21 at 03:12
Yes - collect_predictions() %>% roc_curve(., Truth, pre_class) — Curtis, Jan 27 '21 at 16:52
Hmm...Sorry @Curtis, not sure what's going on. If the confusion matrix above is based on training data then it indicates overfitting (extreme overfitting) where the model is unable to find any difference between groups in the test data, but that seems unlikely given your parameters. Are you able to share your data? — jared_mamrot, Jan 27 '21 at 22:46
@jared_mamrot unfortunately I'm not but I appreciate your thoughts. I'll re-examine the features as many of them are zero-inflated with low variation. I'm not sure if that would impact this issue but regardless it is AN issue that I'll need to address. — Curtis, Jan 28 '21 at 00:32
If you can replicate the issue using a publicly-available dataset (e.g. `install.packages("titanic"); library(titanic); data("Titanic"); training_data <- titanic_train`) you could repost the question and see what others say (see https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) — jared_mamrot, Jan 28 '21 at 00:44

Differences between ROC Curve and Confusion Matrix: R Tidymodels, XGBoost

0 Answers0