2

I'm trying to calculate the prediction interval for a Multivariate Linear Regression by hand, and am noticing some discrepancies with the predict.lm() output and my manual calculation in R.

Here's my code:

library(MASS)
data("Boston") # using the boston dataset
df <- Boston
model <- lm(medv ~ crim + nox + age + dis + tax + black + 0, data = df) # using select variables rather than all variables
n = 3 # set observation number
(pred_val <- predict.lm(object = model, newdata = df[n,], interval = "prediction", level = 0.95))

Output from predict.lm():

fit lwr upr 24.91563 8.025451 41.8058

Now I try to calculate it by hand:

dof <- nrow(df) - 6 # calculate degrees of freedom, subtracting 6 predictor variables (no intercept term)
t_crit <- qt(p = c(.025, .975), df = dof) # calculate t-critical values for 95% confidence
se <- sigma(model) # get standard error of regression
pred_val[1] + (t_crit * se) # add the errors to the predicted value

Output:

[1] 8.060998 41.770252

The lower bound of the predict.lm() function and my manual calculation are close, but aren't an exact match! Is there something that I'm doing wrong?

Legacy
  • 43
  • 1
  • 5

0 Answers0