0

I built a regression model using my training dataset, and I want to use this model to get predicted values in my testing dataset. Therefore, I can compare the predicted values with the actual values in the testing dataset and find the differences between them. However, I don't know how to plug values from the testing dataset to the model without using a for loop.

Here is my regression model:

lm.HOSPITAL <- lm(train_HOSPITAL$dailyQty ~ train_HOSPITAL$DC_STATE + train_HOSPITAL$TYPE_340B_CDE_DESC + train_HOSPITAL$geoState + train_HOSPITAL$IsFriSat)
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
Frank
  • 13
  • 2
  • 1
    `predict(lm.HOSPITAL)`? Or `predict(lm.HOSPITAL) - train_HOSPITAL$dailyQty` for the differences? – Maurits Evers Mar 05 '18 at 23:31
  • Welcome to SO! Although the below answers might have already helped you, please consider taking a look at https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. It is always very useful for us to have a minimal dataset which would help us to reproduce your question. But maybe it is me who missed something now because so far no one else complained about it... which is rare;) – tjebo Mar 06 '18 at 01:03

2 Answers2

0

first predict the response on your train or test data- depending on which data file you want to see the difference of actual vs predicted ( train or test data)?

predict_train= predict(lm.HOSPITAL, newdata=train_HoSPITAL)

then minus the two: difference= predict- train_HOSPITAL$dailyQty

you can do the same to see the prediction difference on your test data as well.

RomRom
  • 302
  • 1
  • 11
0

The problem you will face is that there's no data argument in the original model. So there will be no framework for evaluation of a "newdata" argument in predict. (Who taught you to use $ that in an lm-formula?). Instead, run the model this way:

lm.HOSPITAL <- lm( dailyQty ~ DC_STATE + TYPE_340B_CDE_DESC + geoState + IsFriSat, data=train_HOSPITAL)

Then with a newdata-dataframe use predict to get your desired response at levels of those variables:

 predict( lm.HOSPITAL , newdata= data.frame( DC_STATE=  # values
                                            , TYPE_340B_CDE_DESC= # values
                                            , geoState= #values
                                            , IsFriSat= #values
          )                                  )

Or if you already have a "test_data"-dataframe, then just:

predict( lm.HOSPITAL , newdata= test_data)
IRTFM
  • 258,963
  • 21
  • 364
  • 487