0

I am trying to compare the prediction accuracy of a dataset using a logistic regression model and a neural network. While looking at the confusion matrices of the two methods, the ANN model gives a better output compared to the logistic regression model. However, while plotting the ROC curves for the two methods, it seems that the logistic regression model is better. I am wondering if there is something wrong with my code for the ROC curves.

For context, I am explaining my procedure. First, I divided the dataset into training and testing data.

data = read.csv("heart.csv", header=TRUE)

set.seed(300)
index = sample(seq_len(nrow(data)), size = samplesize) # For logistic 
train <- data[index,]
test <- data[-index,]

normalize <- function(x) {
  return ((x - min(x)) / (max(x) - min(x)))
}
scaled <- as.data.frame(lapply(data, normalize))
index = sample(seq_len(nrow(scaled)), size = samplesize) # For ANN
trainset <- scaled[index, ] 
testset <- scaled[-index, ]

The response variable is "target" so I fit the following GLM :

glm.fit <- glm(target ~ ., data=train, family=binomial(link = "logit"),control = list(maxit = 50))

For the ANN, I used R's neuralnet package and did the following:

library(neuralnet)
nn <- neuralnet(target ~ ., data=trainset, hidden=c(3,2), act.fct = "logistic", err.fct = "sse", linear.output=FALSE, threshold=0.01)

For my ROC curves, I did the following:

For ANN:

prob = compute(nn, testset[, -ncol(testset)] )
prob.result <- prob$net.result

detach(package:neuralnet,unload = T)

library(ROCR)
nn.pred = prediction(prob.result, testset$target)
pref <- performance(nn.pred, "tpr", "fpr")
plot(pref)

And for logistic regression:

prob=predict(glm.fit,type=c("response"))    

library(ROCR)

pred <- prediction(prob, test$target)    
perf <- performance(pred, measure = "tpr", x.measure = "fpr")     
plot(perf, col=rainbow(7), main="ROC curve Admissions", xlab="Specificity", 
     ylab="Sensitivity")   

I would just like some guidance in understanding why the plots seem to suggest that the logistic regression model is better when the confusion matrix suggests otherwise, and understand what I am doing wrong.

Thank you for any input.

  • Read https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example and into the https://github.com/tidyverse/reprex package – Bruno Jan 05 '20 at 01:08
  • Also this is a question better asked at cross validated – Bruno Jan 05 '20 at 01:09
  • There are many potential reasons for AUC and confusion matrix to disagree. Randomly, see https://stackoverflow.com/q/59398046/333599, https://stackoverflow.com/q/47104129/333599 or https://stackoverflow.com/q/38387913/333599 – Calimo Jan 05 '20 at 08:22

0 Answers0