0

I am having some trouble with the following code:

model4 = glm(data = data16, Loan_Status_Coded ~ Coapplicant_Income_Modified +
Dependents_SelfEmployed_1 + Dependents_Imputed_0_Dummy + 
Dependents_Imputed_1_Dummy + Dependents_Imputed_2_Dummy+ 
Self_Employed_Imputed_Coded + Credit_History_Married + Married_Imputed_Coded + 
sqrt_LoanAmount_Imputed + Loan_Amount_Term_Imputed_Low_Dummy + 
Loan_Amount_Term_Imputed_Medium_Dummy + Credit_History_Imputed + 
Education_Coded + Property_Area_Semiurban_Dummy + Property_Area_Rural_Dummy, 
family = binomial(link = "logit"))

summary(model4)
predict5 = predict(data = data16, model4, type = "response")
table(data16$Loan_Status_Coded, predict5>0.5)

Running the table function gives the following error: "all arguments must have the same length" It seems the number of rows in predict5 is less than the number of rows in data16. If I use predict5 = predict(newdata = data16, model4, type = "response"), then the error does not occur, but the number of data points decreases. For instance the output on using newdata is:

FALSE TRUE
   0    40   39
   1     7  176

but data16 has 614 rows. What am I doing wrong here?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Kenneth Singh
  • 335
  • 1
  • 3
  • 15
  • Do you have any `NA` values in your input? – Ben Bolker Oct 21 '16 at 11:46
  • No, all the NA values have already been imputed. – Kenneth Singh Oct 21 '16 at 11:47
  • try `predict(model4, newdata=data16, type="response")` – Ben Bolker Oct 21 '16 at 11:48
  • Using newdata = data16 has dealt with the error, but the number of rows has decreased. The output is this: FALSE TRUE 0 40 39 1 7 176 data16 originally had 614 rows – Kenneth Singh Oct 21 '16 at 11:51
  • Is there any chance you can provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) ? Can you edit your question to incorporate the correction/new information in the comments? – Ben Bolker Oct 21 '16 at 12:02
  • try `table(data16$Loan_Status_Coded, predict5>0.5,useNA="always")` ? Are you *sure* there are no `NA` values in `data16` ? – Ben Bolker Oct 21 '16 at 12:13
  • Thank you so much! There indeed were NA values. The culprit was in a variable "CoapplicantIncome" in data16. It's working now. – Kenneth Singh Oct 21 '16 at 12:39
  • You can post an answer your own question, then (you may have to wait a little while before the system lets you do so), or delete it ... – Ben Bolker Oct 21 '16 at 12:53

1 Answers1

1

The culprit here was "NA" values in one of the variables in data16. It's working fine after dealing with the "NA" values.

Kenneth Singh
  • 335
  • 1
  • 3
  • 15