I'm trying to calculate the prediction interval for a Multivariate Linear Regression by hand, and am noticing some discrepancies with the predict.lm()
output and my manual calculation in R.
Here's my code:
library(MASS)
data("Boston") # using the boston dataset
df <- Boston
model <- lm(medv ~ crim + nox + age + dis + tax + black + 0, data = df) # using select variables rather than all variables
n = 3 # set observation number
(pred_val <- predict.lm(object = model, newdata = df[n,], interval = "prediction", level = 0.95))
Output from predict.lm()
:
fit lwr upr
24.91563 8.025451 41.8058
Now I try to calculate it by hand:
dof <- nrow(df) - 6 # calculate degrees of freedom, subtracting 6 predictor variables (no intercept term)
t_crit <- qt(p = c(.025, .975), df = dof) # calculate t-critical values for 95% confidence
se <- sigma(model) # get standard error of regression
pred_val[1] + (t_crit * se) # add the errors to the predicted value
Output:
[1] 8.060998 41.770252
The lower bound of the predict.lm()
function and my manual calculation are close, but aren't an exact match! Is there something that I'm doing wrong?