3

I am trying to calculate DFFITS by hand. The value obtained should be equal to the first value obtained by dffits function. However there must be something wrong with my own calculation.

attach(cars)

x1 <- lm(speed ~ dist, data = cars) # all observations

x2 <-  lm(speed ~ dist, data = cars[-1,]) # without first obs

x <- model.matrix(speed ~ dist) # x matrix

h <- diag(x%*%solve(crossprod(x))%*%t(x)) # hat values

num_dffits <- x1$fitted.values[1] - x2$fitted.values[1] #Numerator

denom_dffits <- sqrt(anova(x2)$`Mean Sq`[2]*h[1]) #Denominator

df_fits <- num_dffits/denom_dffits #DFFITS

dffits(x1)[1] # DFFITS function
Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
Karthik Shanmukha
  • 417
  • 1
  • 5
  • 14
  • 1
    In the numerator , it is the difference between the first fitted value with all observations and the first fitted values with out the first observation. In the denominator, it is the square root of variance of fitted value with out the first observation. – Karthik Shanmukha Jun 27 '17 at 11:03

1 Answers1

4

Your numerator is wrong. As you have removed first datum from the second model, corresponding predicted value is not in fitted(x2). We need to use predict(x2, cars[1, ]) in place of fitted(x2)[1].


Hat values can be efficiently computed by

h <- rowSums(qr.Q(x1$qr) ^ 2)

or using its R wrapper function

h <- hat(x1$qr, FALSE)

R also has a generic function for getting hat values, too:

h <- lm.influence(x1, FALSE)$hat

or its wrapper function

h <- hatvalues(x1)

You also don't have to call anova to get MSE:

c(crossprod(x2$residuals)) / x2$df.residual
Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248