When I try to train a logistic regression model using the precision-recall metric on this data I keep getting the following errors:
Error: Stopping
In addition: Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.
Here is the output of the console:
Something is wrong; all the AUC metric values are missing:
AUC Precision Recall F
Min. : NA Min. :0.9464 Min. :0.987 Min. :0.9663
1st Qu.: NA 1st Qu.:0.9464 1st Qu.:0.987 1st Qu.:0.9663
Median : NA Median :0.9464 Median :0.987 Median :0.9663
Mean :NaN Mean :0.9464 Mean :0.987 Mean :0.9663
3rd Qu.: NA 3rd Qu.:0.9464 3rd Qu.:0.987 3rd Qu.:0.9663
Max. : NA Max. :0.9464 Max. :0.987 Max. :0.9663
NA's :1
By the looks of it, on the first iteration of cross-validation, the AUC is already NA. However, I have no NAs in my data, and from reading the other posts on stackoverflow I don't have the issues from other posts (eg. my classification labels are "Yes" "No" not "0","1" and I don't have any categorical variables). How can I resolve this issue? My code is below for reference.
library(caret)
library(MLmetrics)
set.seed(1)
#Split data into training and test sets
split <- 0.8 #Proportion of Data to use as training data (rest is test Data)
train <- createDataPartition(med_aggregate$Fraud, p = split, list = FALSE)
#Training parameters
ctrl <- trainControl(method = "repeatedcv",
number = 5,
repeats = 5,
sampling = "up",
summaryFunction = prSummary
)
#Logistic reg
log_reg.fit <- train(
Fraud ~. ,
data = med_aggregate,
subset = train,
method = "glm",
family = "binomial",
metric = "AUC",
trControl = ctrl
)
Here is a sample of the data frame/tibble med_aggregate
Fraud Claims_Outpatie~ Beneficiaries_O~ Beneficiaries_I~ ClaimsPerBene_I~ TotalClaimAmtRe~
<fct> <int> <int> <int> <dbl> <dbl>
1 No 21 21 0 0 6820
2 No 3 3 0 0 520
3 Yes 42 42 32 1 13480
4 No 7 5 0 0 1380
5 No 7 7 0 0 2450
6 No 34 33 0 0 7810
7 No 193 187 0 0 51400
8 No 4 4 0 0 510
9 No 5 5 0 0 250
10 No 15 15 0 0 2450