Wrong standard deviations for predictions in predict.lm in R?

Question

With the following set up, why does one get the same standard deviations in both cases, namely: 1.396411?

Regression:

CopierDataRegression <- lm(V1~V2, data=CopierData1)

Intervals:

X6 <- data.frame(V2=6)
predict(CopierDataRegression, X6, se.fit=TRUE, interval="confidence", level=0.90)
predict(CopierDataRegression, X6, se.fit=TRUE, interval="prediction", level=0.90)

Both give the same result for se.fit.

One gets the correct standard deviations for the predictions with the following code:

z <- predict(CopierDataRegression, X6, se.fit=TRUE)
sqrt(z$se.fit^2 + z$residual.scale^2),

but I dont understand why one in this formula adds the residual standard deviation in the computation of the standard errors, could someone explain this?

Data:

CopierData1 <- structure(list(V1 = c(20L, 60L, 46L, 41L, 12L, 137L, 68L, 89L, 
          4L, 32L, 144L, 156L, 93L, 36L, 72L, 100L, 105L, 131L, 127L, 57L, 
          66L, 101L, 109L, 74L, 134L, 112L, 18L, 73L, 111L, 96L, 123L, 
          90L, 20L, 28L, 3L, 57L, 86L, 132L, 112L, 27L, 131L, 34L, 27L, 
          61L, 77L), V2 = c(2L, 4L, 3L, 2L, 1L, 10L, 5L, 5L, 1L, 2L, 9L, 
          10L, 6L, 3L, 4L, 8L, 7L, 8L, 10L, 4L, 5L, 7L, 7L, 5L, 9L, 7L, 
          2L, 5L, 7L, 6L, 8L, 5L, 2L, 2L, 1L, 4L, 5L, 9L, 7L, 1L, 9L, 2L, 
          2L, 4L, 5L)), .Names = c("V1", "V2"),
          class = "data.frame", row.names = c(NA, -45L))

Seems like one function is delivering an estimate in the coefficient/parameters space and the other in the data space. — IRTFM, Oct 07 '17 at 18:21
for the next person :) --- the linked duplicate has a very thorough answer on the underlying math. If you just want to know what you're seeing with `se.fit` --- yes, it basically is always showing the standard error for Confidence, and no, there isn't another built-in value for the standard error for Prediction. And no, that doesn't exactly make sense to a casual user. — Mike M, Feb 18 '21 at 01:34

Benjamin Christoffersen · Answer 1 · 2017-10-08T06:58:33.997

0

You have to account for error in the estimation due to sampling and from the noise term when you make a prediction. The confidence interval only accounts for the former. See the answer here.

Further, they do not give the same result for the bounds:

> predict(CopierDataRegression, X6, 
+         se.fit=TRUE, interval="confidence", level=0.90)$fit
       fit      lwr     upr
1 89.63133 87.28387 91.9788
> predict(CopierDataRegression, X6, 
+         se.fit=TRUE, interval="prediction", level=0.90)$fit
       fit      lwr      upr
1 89.63133 74.46433 104.7983

The se.fit only gives you for the error of the predicted mean, not the sd of the error term as documented in ?predict.lm:

se.fit standard error of predicted means

residual.scale residual standard deviations

edited Oct 08 '17 at 06:58

answered Oct 07 '17 at 20:36

Benjamin Christoffersen

4,703
15
37

1

Thats my point. That is why I'm writing: "_With the following set up, why does one get the same standard deviations in both cases, namely: 1.396411?_" – HeyJane Oct 08 '17 at 06:30
Sorry, I was replying to "_... I dont understand why one in this formula adds the residual standard deviation in the computation of the standard errors, could someone explain this?_" – Benjamin Christoffersen Oct 08 '17 at 06:48
I have edited my answer to clarify what the `se.fit` element is of the returned object. It should be the same regardless of `interval` argument. – Benjamin Christoffersen Oct 08 '17 at 07:02
I just don't see what new information you are adding to what I've written. Obviously I know the difference between prediction and estimation. I think R's package should give a different answer for se.fit when choosing between prediction or confidence intervals, as I questioned. Second question is how come one adds the residual standard deviation, and if somebody could show this mathematically. – HeyJane Oct 08 '17 at 10:18

Wrong standard deviations for predictions in predict.lm in R?

1 Answers1