It has been explained earlier in Linear Regression, but not for Logistic Regression. And I do not have any NA columns as I am imputing data with "MissForest" so my scenario is as follows:
- We have 19 levels for Survey variable.
- While dividing into testing and training few levels takes 0 values in training.
- Thus while doing Logistic Regression it automatically drops these levels as it does not have any value in training dataset.
How can I handle this?
Function for LR is as follows
LR <- function(df, churnCol){
# browser();
CM <- df
CM$Churn <- churnCol
str(CM)
names(CM)
detach()
attach(CM)
CM_LOGIT = subset(CM, select = c(SalonID,Sex,Spl_instruction,category,survey,member_check,Studio_Client,`mean(Total)`,No_Of_Visits,Churn))
CM_LOGIT <- as.data.frame(CM_LOGIT,stringsAsFactors = T)
str(CM_LOGIT)
sapply(CM_LOGIT,function(x) sum(is.na(x)))
df.CM_LOGIT <- missForest(CM_LOGIT)
summary(df.CM_LOGIT$ximp)
df.CM_LOGIT <- df.CM_LOGIT$ximp
setnames(df.CM_LOGIT, "mean(Total)", "mean_total")
summary(df.CM_LOGIT)
sapply(df.CM_LOGIT,function(x) sum(is.na(x)))
intrain<- createDataPartition(df.CM_LOGIT$Churn,p=0.7,list=FALSE)
set.seed(2019)
training<- df.CM_LOGIT[intrain,]
testing<- df.CM_LOGIT[-intrain,]
dim(training); dim(testing);
LogModel <- glm(Churn ~ .,family=binomial(link="logit"),data=training)
print(summary(LogModel))
print(anova(LogModel, test="Chisq"))
# testing$Churn <- as.character(testing$Churn)
# testing$Churn[testing$Churn=="No"] <- "0"
# testing$Churn[testing$Churn=="Yes"] <- "1"
# testing$Churn <- as.factor(testing$Churn)
fitted.results <- predict.glm(object = LogModel,newdata = testing, type='response')
print("Confusion Matrix for Logistic Regression"); table(testing$Churn, fitted.results > 0.5)
tab.LOGIT <- table(testing$Churn, fitted.results > 0.5)
print(tab.LOGIT)
accuracy.LOGIT<-sum(diag(tab.LOGIT))/sum(tab.LOGIT)
print(accuracy.LOGIT);
#ROCR Curve
library(ROCR)
ROCRpred <- prediction(fitted.results, testing$Churn)
ROCRperf <- performance(ROCRpred, 'tpr','fpr')
print(plot(ROCRperf, colorize = TRUE, text.adj = c(-0.2,1.7)));
print(InformationValue::AUROC(testing$Churn,fitted.results));
print(exp(cbind(OR=coef(LogModel), confint(LogModel))));
}
Calling the function for LR
LR(CHURN_MODELLING_DATA,CHURN_MODELLING_DATA$Churn30)
After calling the function it gives an error as
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor survey has new levels Just Dial
But when we debug and look for the data it is taking for predicting the levels are fine. Browse[2]> table(training$survey)
Banners Corporate Events
30 5 32
EXISTING CLIENTS Gift Coupon - External Gift Coupon - Internal
4139 55 41
Hoardings Just Dial Just Dial N
244 0 1
News Paper No Parking Board / Way Signage Others
147 0 1259
Pamphlets Radio Adds Reference
10 1 4877
Signage SMS TV-Adds
2403 0 1
Web Site
18
Browse[2]> table(training$survey)
Banners Corporate Events
30 5 32
EXISTING CLIENTS Gift Coupon - External Gift Coupon - Internal
4139 55 41
Hoardings Just Dial Just Dial N
244 0 1
News Paper No Parking Board / Way Signage Others
147 0 1259
Pamphlets Radio Adds Reference
10 1 4877
Signage SMS TV-Adds
2403 0 1
Web Site
18
The data for the prediction is ok! still I am facing the issue of new levels
It must run without any error as it is running if I am not putting above code into a function.