3

I'm studying the linear regression functions in R and am confused by the output I'm receiving:

library(ISLR)
data("Auto")
model1 <- lm(mpg~horsepower,data = Auto)
summary(model1)
predict(model1, data.frame(horsepower=c(98)), interval="confidence")
predict(model1, data.frame(horsepower=c(98)), interval="prediction")

Here, the model summary I get is this:

Coefficients:

             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 39.935861   0.717499   55.66   <2e-16 ***
horsepower  -0.157845   0.006446  -24.49   <2e-16 ***
---
Signif. codes:  
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.906 on 390 degrees of freedom
Multiple R-squared:  0.6059,    Adjusted R-squared:  0.6049 
F-statistic: 599.7 on 1 and 390 DF,  p-value: < 2.2e-16

The first predict function gives me:

       fit      lwr      upr
1 24.46708 23.97308 24.96108

I thought the confidence interval would be the fit +/- 2*std.error of horsepower. How does R arrive at its confidence and prediction interval calculations?

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
xyy
  • 547
  • 1
  • 5
  • 12
  • 3
    http://stats.stackexchange.com/questions/85560/shape-of-confidence-interval-for-predicted-values-in-linear-regression – Ben Bolker Sep 08 '15 at 01:22
  • The `predict` function is written in R, you can check its source to see how it's calculated. The interval code apparently starts at function line 145. – Molx Sep 08 '15 at 01:30
  • You need the intercept and main effect to estimate at a point. – IRTFM Sep 08 '15 at 03:37
  • @BondedDust can you elaborate what you mean? I'm quite new at this, thanks – xyy Sep 08 '15 at 03:55
  • 1
    It should be clear if you look at the plots on the page cited by @BenBolker. The uncertainty regarding the "intercept", which is actually the uncertainty regarding the value around the mean of X causes the narrow waist in the band of predictions while the uncertainty regarding the slope causes the band to widen at either end where the numbers of points that form a basis for prediction is "tapering off". So both the S.E. of the Intercept and the S.E. of the slope are acting together. It turns out to be a quadratic equation for either 95% CI's or prediction intervals. – IRTFM Sep 08 '15 at 23:30

0 Answers0