0

I've applied predictive modelling on the logistic regression model on my Training data consisting of 4796 rows and then used it to predict probabilities on my testing data of 29597 variables and used the following commands to use this:

model=glm(Training_Data_BSAD$DelayFlag~Training_Data_BSAD$amount_in_local_currency,family=binomial()) 

predictTrain=predict(model, Testing_Data_BSID, type="response")

Warning message: 'newdata' had 29597 rows but variables found have 4796 rows

Both my training and testing datasets have 98 variables each with same column names.

Can you suggest a way to improve this error and get predicted results for my testing data instead of training data? Any suggestion will be appreciated. Thank you.

eipi10
  • 91,525
  • 24
  • 209
  • 285
  • As discussed in the answer linked by @FlorianMaas, it's because you used the data frame name in the model formula (see linked answer for details). Instead do `glm(DelayFlag ~ amount_in_local_currency, family=binomial, data=Training_Data_BSAD)` – eipi10 Jul 17 '17 at 05:52
  • yes but I am not able to apply on my data. Still the same error. Can you show me how to do this? Thank you. – Kriti Shrivastava Jul 17 '17 at 05:53
  • Thanks but I tried- glm(DelayFlag ~ amount_in_local_currency, family=binomial, data=Training_Data_BSAD). Still the same error. – Kriti Shrivastava Jul 17 '17 at 05:57
  • Can you post a reproducible example? Add small samples of the training and testing data to your question by pasting in the output of `dput(training_sample)`, `dput(testing_sample)` (make sure you post sample data that give the same error as your full data). – eipi10 Jul 18 '17 at 14:38

0 Answers0