Predict function on a Regression model giving error

Question

I am trying to predict the predict values of y variable based on my polynomial model.

lumber.predict.plm=lm(lumber.unemployment.women$lumber.1980.2000 ~ 
                        scale(lumber.unemployment.women$woman.1980.2000) +
                        I(scale(lumber.unemployment.women$woman.1980.2000)^2))

xmin=min(lumber.unemployment.women$woman.1980.2000)
xmax=max(lumber.unemployment.women$woman.1980.2000)
predicted.lumber.whole=data.frame(x=seq(xmin, xmax, length.out=500))
predicted.lumber.whole$lumber=predict(lumber.predict.plm,newdata=predicted.lumber.whole,
                                       interval="confidence")

All of the above commands work fine except the last one. It gives the following error -

predicted.lumber.whole$lumber=predict(lumber.predict.plm,newdata=predicted.lumber.whole,
+                                        interval="confidence")

#Error in `$<-.data.frame`(`*tmp*`, "lumber", value = c(134.507238798567,  : 
#  replacement has 252 rows, data has 500
#In addition: Warning message:
#'newdata' had 500 rows but variables found have 252 rows

Data frame properties on which Regression is being carried out..

str(lumber.unemployment.women)
#'data.frame':  252 obs. of  2 variables:
# $ lumber.1980.2000: num  108.2 109.9 109.6 99.8 97 ...
# $ woman.1980.2000 : num  5.8 5.9 5.7 6.3 6.4 6.5 6.6 6.7 6.3 6.7 ...

Why should predicted values depend on the number of observations that I have in the data frame ?

Try feeding the `predict` all variables that you use, that includes the polynomial terms. — Roman Luštrik, Nov 29 '13 at 10:46

Stephen Henderson · Answer 1 · 2013-11-29T11:11:30.080

0

I think the following is your problem although the error message seems a bit obscure to me. Here is a simplified version of your code:

L=data.frame(woman=1:100, lumber=1:100+rnorm(100))
L.lm= lm(lumber ~ woman, data=L) 
xmin =-20; xmax= 120;

The following gives an error because the original data doesn't have "x" variable in your new data. Note that the lm() above did not automatically assign it to a variable called "x".

nd=data.frame(x=seq(xmin, xmax, length.out=500))
predict(L.lm, newdata=nd,interval="confidence")

Error in eval(expr, envir, enclos) : object 'woman' not found

Rather it is looking for "woman". SO if you did summary(L.lm) you would find the coefficient was "woman" not "x".

The following works as original and new data contain the same variables

nd=data.frame(woman=seq(xmin, xmax, length.out=500))
predict(L.lm, newdata=nd,interval="confidence")

        fit       lwr       upr
1 -20.32932 -20.85072 -19.80792
2 -20.04737 -20.56699 -19.52775
3 -19.76542 -20.28327 -19.24757
4 -19.48347 -19.99955 -18.96740
5 -19.20153 -19.71582 -18.68723
6 -18.91958 -19.43210 -18.40705
etc..

ps just to be clear this will also work with ...

L.lm= lm(lumber ~ poly(woman,2), data=L)

a cleaner way of expressing polynomial fits.

edited Nov 29 '13 at 11:11

answered Nov 29 '13 at 10:58

Stephen Henderson

6,340
3
27
33

Unfortunately it is persisting with the same error, even after I made variable name changes :(. > predicted.lumber.whole=data.frame(woman.1980.2000=seq(xmin, xmax, length.out=500)) > predicted.lumber.whole$lumber=predict(lumber.predict.plm,newdata=predicted.lumber.whole, + interval="confidence") Error in `$<-.data.frame`(`*tmp*`, "lumber", value = c(134.507238798567, : replacement has 252 rows, data has 500 In addition: Warning message: 'newdata' had 500 rows but variables found have 252 rows – Rajat Panda Nov 29 '13 at 11:54
Thanks for the note on Ploynomial fits. Do you know a way, if one can create a model using poly function but with scaled x variable? i.e. L.lm= lm(lumber ~ poly(scale(woman),2), data=L) But when I plot the graph it should be lumber ~ woman and NOT scale(woman). Do you know of a way to achieve it ? – Rajat Panda Nov 29 '13 at 12:09
nope sorry I cannot see the other error what happens when you just do it without assignment: `predict(lumber.predict.plm,newdata=predicted.lumber.whole,interval="confidence")` – Stephen Henderson Nov 29 '13 at 12:13
In book IPSUR, page 304, scaling is advised for regression, even when you are dealing with one variable ..but the degrees are more than one. – Rajat Panda Nov 29 '13 at 12:21
on reflection yes you're right about scale. and sorry no I can't think how to plot against the original. – Stephen Henderson Nov 29 '13 at 12:34

score 0 · Answer 2 · answered Nov 29 '13 at 21:53

Just modified the linear model name.. and it works fine. Don't know the root cause of the error though!! Would be great, if someone can explain the cause of earlier error note. Modified script noted below.

lumber.predict.plm1=lm(lumber.1980.2000 ~ scale(woman.1980.2000) +
                        I(scale(woman.1980.2000)^2), data=lumber.unemployment.women)
xmin=min(lumber.unemployment.women$woman.1980.2000)
xmax=max(lumber.unemployment.women$woman.1980.2000)
predicted.lumber.all=data.frame(woman.1980.2000=seq(xmin,xmax,length.out=100))
predicted.lumber.all$lumber=predict(lumber.predict.plm1,newdata=predicted.lumber.all)
> str(predicted.lumber.all)
'data.frame':   100 obs. of  2 variables:
 $ woman.1980.2000: num  3.3 3.36 3.42 3.48 3.54 ...
 $ lumber         : num  195 193 192 190 188 ...

Predict function on a Regression model giving error

2 Answers2