R Caret Train AUC missing values in resampled performance measures

Question

When I try to train a logistic regression model using the precision-recall metric on this data I keep getting the following errors:

Error: Stopping
In addition: Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,  :
  There were missing values in resampled performance measures.

Here is the output of the console:

Something is wrong; all the AUC metric values are missing:
      AUC        Precision          Recall            F         
 Min.   : NA   Min.   :0.9464   Min.   :0.987   Min.   :0.9663  
 1st Qu.: NA   1st Qu.:0.9464   1st Qu.:0.987   1st Qu.:0.9663  
 Median : NA   Median :0.9464   Median :0.987   Median :0.9663  
 Mean   :NaN   Mean   :0.9464   Mean   :0.987   Mean   :0.9663  
 3rd Qu.: NA   3rd Qu.:0.9464   3rd Qu.:0.987   3rd Qu.:0.9663  
 Max.   : NA   Max.   :0.9464   Max.   :0.987   Max.   :0.9663  
 NA's   :1

By the looks of it, on the first iteration of cross-validation, the AUC is already NA. However, I have no NAs in my data, and from reading the other posts on stackoverflow I don't have the issues from other posts (eg. my classification labels are "Yes" "No" not "0","1" and I don't have any categorical variables). How can I resolve this issue? My code is below for reference.

library(caret)
library(MLmetrics)

set.seed(1)
#Split data into training and test sets
split <- 0.8  #Proportion of Data to use as training data (rest is test Data)
train <- createDataPartition(med_aggregate$Fraud, p = split, list = FALSE)

#Training parameters
ctrl <- trainControl(method = "repeatedcv", 
                     number = 5, 
                     repeats = 5,
                     sampling = "up", 
                     summaryFunction = prSummary
)

#Logistic reg
log_reg.fit <- train(
  Fraud ~. ,
  data = med_aggregate,
  subset = train,
  method = "glm",
  family = "binomial",
  metric = "AUC",
  trControl = ctrl
)

Here is a sample of the data frame/tibble med_aggregate

Fraud Claims_Outpatie~ Beneficiaries_O~ Beneficiaries_I~ ClaimsPerBene_I~ TotalClaimAmtRe~
   <fct>            <int>            <int>            <int>            <dbl>            <dbl>
 1 No                  21               21                0                0             6820
 2 No                   3                3                0                0              520
 3 Yes                 42               42               32                1            13480
 4 No                   7                5                0                0             1380
 5 No                   7                7                0                0             2450
 6 No                  34               33                0                0             7810
 7 No                 193              187                0                0            51400
 8 No                   4                4                0                0              510
 9 No                   5                5                0                0              250
10 No                  15               15                0                0             2450

It looks like you used `med_aggregate_I` to create your partition but `med_aggregate` to train the model. Is that correct? You might also wish to look at this answer and the link it includes: https://stackoverflow.com/a/56525428/11071807 — Wil, Aug 02 '20 at 12:16
Sorry, I corrected that in my code after, but it was still giving the same error. — BaroqueFreak, Aug 02 '20 at 12:23
Something is wrong with your train data. If you look at the summary, for precision, all of them are giving you one value, meaning the cross-validation or sampling is doing something funky — StupidWolf, Aug 05 '20 at 12:25
Can you share the dataset somehow or if it is from kaggle, provide the link? — StupidWolf, Aug 05 '20 at 12:25
I do not have the source link and I do not think it is from kaggle. Can upload as csv if that helps. — BaroqueFreak, Aug 06 '20 at 09:33
You probably have solved this already, but for me, I had to include `classProbs = TRUE` to `trainControl`. — Omar Omeiri, Jun 21 '22 at 00:59

R Caret Train AUC missing values in resampled performance measures

0 Answers0