-1

I want to calculate a few predicted values based on a regression model esimated in R using lm().

The points to be predicted are not included in the dataset used for regression-- although I suppose they could be with NA's standing in for the dependent variable. That works in SAS but I'd rather not in this instance.

The simple approach below initially worked well for my simple purpose.

myLm1 = lm(log(p) ~ u + v + w, data=myDat)                                          

DatToPred1 = data.frame(u=72, v=20, w=85)                                                     

predict(myLm1, DatToPred1)            

But suppose now the model specification includes an interaction x*y. The lines below throw an error.

myLm2 = lm(log(p) ~ u + v + w + x*y, data=myDat)

DatToPred2 = data.frame(u=72, v=20, w=85, x=1, y=45)                                                     

predict(myLm2, DatToPred2)

Error in data.frame(u=72, v=20, w=85, x=1, y=45,  : 
  argument is missing, with no default

This seems strange since lm() can find x and y to form x*y, seems like predict() might be able to do the same.

Incidently, including x*y in the definition of DatToPred2 as below also fails.

DatToPred2 = data.frame(u=72, v=20, w=85, x*y=45)

Finally suppose the model has been further augmented to include a full suite of dummies for the categorical variable z.

myLm3 = lm(log(p) ~ u + v + w + x*y + as.factor(z), data=myDat)

I'm at a loss for a way to specify the values for a point to be estimated. Further, z can take on a large number n of values and listing all the values for its dummies corresponding to a particular point to be predicted would be tedious:

   d_z1=0, d_z2=0, ... , d_zi=1, d_z(i+1)=0, ... , d_zn=0

And in any event I don't know how R would expect to see these dummies named in the data.frame() definition for the point to be predicted.

The time will come when there are a large number of points to be predicted and their values will be stored together in a dataframe. But at this point it would be a great advance to find a way to predict a single point in a model with interactions and as.factor's.

There are a lot of online examples involving lm() and predict(), but those I've found tend not to involve the tweaks raised here.

Thanks in advance.

eipi10
  • 91,525
  • 24
  • 209
  • 285
jackw19
  • 375
  • 2
  • 4
  • 7
  • I cannot reproduce your error. Predict should work fine with interaction terms. You should define `mydat` here so we can run the same code as you. See [how to make a great reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – MrFlick Jun 29 '15 at 20:14

1 Answers1

3

You haven't provided a reproducible example (i.e., data and code that allows others to reproduce your error), but I don't have a problem when I try something similar with a built-in data frame:

m1 = lm(mpg ~ wt + carb + qsec*hp, data=mtcars)

pred.dat=data.frame(carb=2, hp=120, qsec=10, wt=2.5)

predict(m1, newdata=pred.dat)

1 
21.46763 

To predict with categorical variables, just supply the category for which you want a prediction:

m2 = lm(Sepal.Length ~ Petal.Length + Species, data=iris)

pred.dat = data.frame(Petal.Length=1.2, Species="setosa")
predict(m2, newdata=pred.dat)

If you want predictions for all combinations of a set of variables (including categorical/dummy variables), use expand.grid to generate all the combinations:

pred.dat = expand.grid(Petal.Length=1:5, Species=unique(iris$Species))
predict(m2, newdata=pred.dat)
eipi10
  • 91,525
  • 24
  • 209
  • 285
  • Thank you, eipi10. The problem with the interaction variable was a syntax error; my bad. The equivalent of your suggestion Species="setosa" in my dataset worked a treat for the as.factor part of the question. Thanks again. – jackw19 Jun 30 '15 at 00:42