Another user-friendly option is to use the caret
library, which makes it pretty straightforward to fit and compare regression/classification models in R. The following example code uses the GermanCredit
dataset to predict credit worthiness using a logistic regression model. The code is adapted from this blog: https://www.r-bloggers.com/evaluating-logistic-regression-models/.
library(caret)
## example from https://www.r-bloggers.com/evaluating-logistic-regression-models/
data(GermanCredit)
## 60% training / 40% test data
trainIndex <- createDataPartition(GermanCredit$Class, p = 0.6, list = FALSE)
GermanCreditTrain <- GermanCredit[trainIndex, ]
GermanCreditTest <- GermanCredit[-trainIndex, ]
## logistic regression based on 10-fold cross-validation
trainControl <- trainControl(
method = "cv",
number = 10,
classProbs = TRUE,
summaryFunction = twoClassSummary
)
fit <- train(
form = Class ~ Age + ForeignWorker + Property.RealEstate + Housing.Own +
CreditHistory.Critical,
data = GermanCreditTrain,
trControl = trainControl,
method = "glm",
family = "binomial",
metric = "ROC"
)
## AUC ROC for training data
print(fit)
## AUC ROC for test data
## See https://topepo.github.io/caret/measuring-performance.html#measures-for-class-probabilities
predictTest <- data.frame(
obs = GermanCreditTest$Class, ## observed class labels
predict(fit, newdata = GermanCreditTest, type = "prob"), ## predicted class probabilities
pred = predict(fit, newdata = GermanCreditTest, type = "raw") ## predicted class labels
)
twoClassSummary(data = predictTest, lev = levels(predictTest$obs))