I am training Logistic Regression in R. I use train set and test set. I have some data and binary output. In a data file the output is the integers 1 or 0 without missing values. I have more 1 than 0 (the proportion is 70/30).
The result of LR is very different depending on if I factories the output or not, namely if I keep output variable as numeric 0-1 and I write
m1 <- glm(output~.,data=dt_tr,family=binomial())
then I get something without warnings and errors and if I write
dt$output<-as.factor(ifelse(dt$output == 1, "Good", "Bad"))
m1 <- glm(output~.,data=dt_tr,family=binomial())
I get completely different performance! What could it be?
To be more precise, after training LR I do the following:
score <- predict(m1,type='response',dt_test)
m1_pred <- prediction(m1_score, dt_test$output)
m1_perf <- performance(m1_pred,"tpr","fpr")
#ROC
plot(m1_perf, lwd=2, main="ROC")
I get very different ROCs and AUCs.