1

I have reviewed the following similar questions and answers but believe my situation is different enough to warrant another question.

Getting Warning: " 'newdata' had 1 row but variables found have 32 rows" on predict.lm in R

R Warning: newdata' had 15 rows but variables found have 22 rows

Warning message 'newdata' had 1 row but variables found have 16 rows in R

warning when calculating predicted values

Trouble using predict with linear model in R

Predict.lm in R fails to recognize newdata

The last question listed has a great answer from Joran that gets to the heart of the naming convention between what was modeled and what is being scored.

The model I am fitting is a polynomial which generates some naming problems.

mdl <- lm(val ~ poly(grp,2), data = mRetCurv)
model.frame(mdl)

Generates the following output:

     val       poly(grp, 2).1  poly(grp, 2).2
1   39.54227   -0.290170670    0.374017601
2   48.68225   -0.272368788    0.308602552

Note the name of my predictor variables. If I call

predict.lm(mdl, newdata = apl$grp)

I get the standard warning as the variable grp != poly(grp, 2).1 or poly(grp, 2).2 as far as predict.lm is concerned. I tried making a duplicate column of grp and renaming the two to match the model.frame but R doesn't like "poly(grp, 2).1" as a column name. Nor is this a data efficient solution replicating a column when I apply it to many rows.

Any help is appreciated.

Thank you

VPMACH
  • 37
  • 1
  • 6
  • 1
    I think `newdata` needs to be a data frame rather than a vector. What happens if you run `predict(mdl, newdata = apl)`. – eipi10 Nov 13 '17 at 19:01
  • apl is a data frame and has a variable grp in it already. That said, I just tried what you said and dropped the apl$grp for just apl and it worked perfectly. If you add it as an answer instead of a comment I'd be more than happy to mark it correct for you. – VPMACH Nov 13 '17 at 19:05

1 Answers1

0

apl$grp is a vector, but predict requires the newdata argument to be a data frame.* This data frame must contain columns with the same names as the predictor variables used to fit the model (though it can contain other columns as well). So, the following code should work:

predict(mdl, newdata = apl)

You can use predict rather than predict.lm. mdl is an object of class lm, which causes predict to "dispatch" the predict.lm method automatically.


* Strictly speaking, since this is an lm model, the predict "method" that gets dispatched is predict.lm and that method requires that newdata be a data frame. predict.glm also requires a data frame. However, there are some predict methods that can take other types of arguments. For example:

  • The randomForest package has a predict method for randomForest models that can take a data frame or matrix as the newdata argument.
  • The glmnet package has a predict method for glmnet models that requires a matrix, although the argument is called newx rather than newdata in that case.
eipi10
  • 91,525
  • 24
  • 209
  • 285