1

I am trying to fit a logistic regression to my data, but I get this error:

logistic <- lm(response ~., data = df_without, family='binomial')

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

Unlike the two other questions on this topic (here and here), I have >=2 levels for all my factors, and my response variable has 2 levels as well:

summary(df_without$response)
     0      1 
123534  64591 


summary(df_without[sapply(df_without, is.factor)])

enter image description here

My dataframe is available here as an .Rdata file.

sessionInfo()

R version 3.2.0 (2015-04-16)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.1 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  splines   stats     graphics  grDevices utils         datasets  methods   base     

other attached packages:
 [1] Amelia_1.7.3        Rcpp_0.11.6         randomForest_4.6-10 e1071_1.6-4         plyr_1.8.2         
 [6] gbm_2.1.1           survival_2.38-1     glmnet_2.0-2            foreach_1.4.2       Matrix_1.2-0       
[11] caret_6.0-47        ggplot2_1.0.1       lattice_0.20-31     lubridate_1.3.3     RJDBC_0.2-5        
[16] rJava_0.9-6         DBI_0.3.1          

loaded via a namespace (and not attached):
 [1] compiler_3.2.0      nloptr_1.0.4        class_7.3-12        iterators_1.0.7     tools_3.2.0        
 [6] digest_0.6.8        lme4_1.1-7          memoise_0.2.1       nlme_3.1-120        gtable_0.1.2       
[11] mgcv_1.8-6          brglm_0.5-9         SparseM_1.6         proto_0.3-10        BradleyTerry2_1.0-6
[16] stringr_1.0.0       gtools_3.5.0        grid_3.2.0          nnet_7.3-9          foreign_0.8-63     
[21] minqa_1.2.4         reshape2_1.4.1      car_2.0-25          magrittr_1.5        scales_0.2.4       
[26] codetools_0.2-11    MASS_7.3-40         pbkrtest_0.4-2      colorspace_1.2-6    quantreg_5.11      
[31] stringi_0.4-1       munsell_0.4.2 
Community
  • 1
  • 1
skunkwerk
  • 2,920
  • 2
  • 37
  • 55

1 Answers1

3

I know that code only answers are hunted down and dealt with severely by the moderators but this really does speak for itself:

 # I did download the excessively large file
> table(df_without[ complete.cases(df_without), 'pymnt_plan'])

        n    y 
   0 1231    0 
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • ... and therefore `logistic <- glm(response ~. - pymnt_plan, data = df_without, family='binomial')` should work nicely – Ben Bolker Jun 21 '15 at 02:36
  • @BenBolker: Well, maybe not. There are three columns with lots of missing data: `$ mths_since_last_delinq $ mths_since_last_record $ mths_since_last_major_derog` so I suspect some effort needs to be focused on understanding what that actually means. It may be favorable feature in terms of risk assessment to never have had a delinquency or a "derog" (whatever a 'derog' might be, but it doesn't sound good). The right answer may be ... Look at your data _First_; Don't wait until you get a flakey result. – IRTFM Jun 21 '15 at 16:24