1

It has been explained earlier in Linear Regression, but not for Logistic Regression. And I do not have any NA columns as I am imputing data with "MissForest" so my scenario is as follows:

  1. We have 19 levels for Survey variable.
  2. While dividing into testing and training few levels takes 0 values in training.
  3. Thus while doing Logistic Regression it automatically drops these levels as it does not have any value in training dataset.

How can I handle this?

Function for LR is as follows

LR <- function(df, churnCol){
  # browser();
  CM <- df
  CM$Churn <- churnCol
  str(CM)
  names(CM)
  detach()
  attach(CM)
  CM_LOGIT = subset(CM, select = c(SalonID,Sex,Spl_instruction,category,survey,member_check,Studio_Client,`mean(Total)`,No_Of_Visits,Churn))
  CM_LOGIT <- as.data.frame(CM_LOGIT,stringsAsFactors = T)

  str(CM_LOGIT)

  sapply(CM_LOGIT,function(x) sum(is.na(x)))

  df.CM_LOGIT <- missForest(CM_LOGIT)
  summary(df.CM_LOGIT$ximp)
  df.CM_LOGIT <- df.CM_LOGIT$ximp

  setnames(df.CM_LOGIT, "mean(Total)", "mean_total")
  summary(df.CM_LOGIT)
  sapply(df.CM_LOGIT,function(x) sum(is.na(x)))

  intrain<- createDataPartition(df.CM_LOGIT$Churn,p=0.7,list=FALSE)
  set.seed(2019)
  training<- df.CM_LOGIT[intrain,]
  testing<- df.CM_LOGIT[-intrain,]

  dim(training); dim(testing);

  LogModel <- glm(Churn ~ .,family=binomial(link="logit"),data=training)
  print(summary(LogModel))

  print(anova(LogModel, test="Chisq"))

  # testing$Churn <- as.character(testing$Churn)
  # testing$Churn[testing$Churn=="No"] <- "0"
  # testing$Churn[testing$Churn=="Yes"] <- "1"
  # testing$Churn <- as.factor(testing$Churn) 
  fitted.results <- predict.glm(object =  LogModel,newdata =  testing, type='response')
  print("Confusion Matrix for Logistic Regression"); table(testing$Churn, fitted.results > 0.5)
  tab.LOGIT <- table(testing$Churn, fitted.results > 0.5)
  print(tab.LOGIT)
  accuracy.LOGIT<-sum(diag(tab.LOGIT))/sum(tab.LOGIT)
  print(accuracy.LOGIT);

  #ROCR Curve
  library(ROCR)
  ROCRpred <- prediction(fitted.results, testing$Churn)
  ROCRperf <- performance(ROCRpred, 'tpr','fpr')
  print(plot(ROCRperf, colorize = TRUE, text.adj = c(-0.2,1.7)));
  print(InformationValue::AUROC(testing$Churn,fitted.results));

  print(exp(cbind(OR=coef(LogModel), confint(LogModel))));
}

Calling the function for LR

LR(CHURN_MODELLING_DATA,CHURN_MODELLING_DATA$Churn30)

After calling the function it gives an error as

Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor survey has new levels Just Dial

But when we debug and look for the data it is taking for predicting the levels are fine. Browse[2]> table(training$survey)

                   Banners                      Corporate                         Events 
                        30                              5                             32 
          EXISTING CLIENTS         Gift Coupon - External         Gift Coupon - Internal 
                      4139                             55                             41 
                 Hoardings                      Just Dial                    Just Dial N 
                       244                              0                              1 
                News Paper No Parking Board / Way Signage                         Others 
                       147                              0                           1259 
                 Pamphlets                     Radio Adds                      Reference 
                        10                              1                           4877 
                   Signage                            SMS                        TV-Adds 
                      2403                              0                              1 
                  Web Site 
                        18 

Browse[2]> table(training$survey)

                   Banners                      Corporate                         Events 
                        30                              5                             32 
          EXISTING CLIENTS         Gift Coupon - External         Gift Coupon - Internal 
                      4139                             55                             41 
                 Hoardings                      Just Dial                    Just Dial N 
                       244                              0                              1 
                News Paper No Parking Board / Way Signage                         Others 
                       147                              0                           1259 
                 Pamphlets                     Radio Adds                      Reference 
                        10                              1                           4877 
                   Signage                            SMS                        TV-Adds 
                      2403                              0                              1 
                  Web Site 
                        18 

The data for the prediction is ok! still I am facing the issue of new levels

It must run without any error as it is running if I am not putting above code into a function.

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197

0 Answers0