0

I used ApacheData data with 83784 rows to build a linear regression model:

fit <-lm(tomorrow_apache~ as.factor(state_today)
         +as.numeric(daily_creat) 
         + as.numeric(last1yr_min_hosp_icu_MDRD)
         +as.numeric(bun)
         +as.numeric(urin)
         +as.numeric(category6)
         +as.numeric(category7)
         +as.numeric(other_fluid)
         + as.factor(daily)
         + as.factor(age)
         + as.numeric(apache3) 
         + as.factor(mv)
         + as.factor(icu_loc) 
         + as.factor(liver_tr_before_admit)  
         + as.numeric(min_GCS)
         + as.numeric(min_PH)  
         + as.numeric(previous_day_creat)  
         + as.numeric(previous_day_bun) ,ApacheData)

And I want to use this model to predict a new input so I give each predictor variable a value:

predict(fit, data=data.frame(state_today=1, daily_creat=2.3, last1yr_min_hosp_icu_MDRD=3,     bun=10, urin=0.01, category6=10, category7=20, other_fluid=0, daily=2 , age=25, apache3=12, mv=1, icu_loc=1, liver_tr_before_admit=0, min_GCS=20, min_PH=3, previous_day_creat=2.1, previous_day_bun=14))

I expect a single value as a prediction to this new input, but I get many many predictions! I don't know why is this happening. What am I doing wrong?

Thanks a lot for your time!

user54626
  • 55
  • 7
  • 1
    See `?predict.lm`, there is no argument called `data`. We will need a reproducible example to help you. It's cleaner if you prepare your data _before_ you send it to `lm`. Here's a page that might help you make an example: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Roman Luštrik Apr 19 '14 at 09:19
  • 1
    try `newdata` instead of `data`. – James King Apr 19 '14 at 10:33
  • I used "newdata" instead of "data" and I get error: Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor as.factor(daily) has new level 2 – user54626 Apr 19 '14 at 21:21
  • daily is equal to 2 in the original data that I built the lm model with and I cut the original data with running this: ApacheData$daily <- cut(ApacheData$daily, breaks=c(-1, 0, 1, 2, 3, 9,3000)) so still 2 is in the levels why is it saying that daily has a new level 2? – user54626 Apr 19 '14 at 21:25

1 Answers1

0

You may also want to try the excellent effects package in R (?effects). It's very useful for graphing the predicted probabilities from your model by setting the inputs on the right-hand side of the equation to particular values. I can't reproduce the example you've given in your question, but to give you an idea of how to quickly extract predicted probabilities in R and then plot them (since this is vital to understanding what they mean), here's a toy example using the in-built data sets in R:

install.packages("effects") # installs the "effects" package in R
library(effects) # loads the "effects" package
data(Prestige) # loads in-built dataset
m <- lm(prestige ~ income + education + type, data=Prestige) 

# this last step creates predicted values of the outcome based on a range of values
# on the "income" variable and holding the other inputs constant at their mean values
eff <- effect("income", m, default.levels=10) 
plot(eff) # graphs the predicted probabilities
statsRus
  • 573
  • 3
  • 13