Factor output in binary Logistic regression in R

Question

I am training Logistic Regression in R. I use train set and test set. I have some data and binary output. In a data file the output is the integers 1 or 0 without missing values. I have more 1 than 0 (the proportion is 70/30).

The result of LR is very different depending on if I factories the output or not, namely if I keep output variable as numeric 0-1 and I write

m1 <- glm(output~.,data=dt_tr,family=binomial())

then I get something without warnings and errors and if I write

dt$output<-as.factor(ifelse(dt$output == 1, "Good", "Bad"))
m1 <- glm(output~.,data=dt_tr,family=binomial())

I get completely different performance! What could it be?

To be more precise, after training LR I do the following:

score <- predict(m1,type='response',dt_test)
m1_pred <- prediction(m1_score, dt_test$output)
m1_perf <- performance(m1_pred,"tpr","fpr")
#ROC
plot(m1_perf, lwd=2, main="ROC")

I get very different ROCs and AUCs.

Hard to know without some data, or the actual errors / warnings — Robert, Apr 11 '19 at 14:01
Can you edit your question and post some example data. See this post for a guide: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — Phil, Apr 11 '19 at 14:03

score 0 · Answer 1 · answered Apr 11 '19 at 14:15

Without seeing your data, I would suggest that changing your result variable to a factor is causing the problem.

Your original data are binary 1/0, meaning, when they are processed as numbers during the regression they are treated as literally 1 and 0. But when you turn them into factors, the factors are numerically treated as 1 and 2:

x <- c(0, 1, 1, 0, 0, 1, 1)
y <- as.factor(ifelse(x == 1, "Good", "Bad"))
as.numeric(y)
[1] 1 2 2 1 1 2 2

score 0 · Answer 2 · answered Apr 11 '19 at 14:53

0

It was my silly mistake. I just forgot to set seed. The only think I would like to add, that if you deal with Random Forest then you must factorise the output, otherwise R will treat it as a numerical data.

answered Apr 11 '19 at 14:53

KimMik

11
2

Factor output in binary Logistic regression in R

2 Answers2