Calculation of DFFITS as diagnostic for Leverage and Influence in regression

Question

I am trying to calculate DFFITS by hand. The value obtained should be equal to the first value obtained by dffits function. However there must be something wrong with my own calculation.

attach(cars)

x1 <- lm(speed ~ dist, data = cars) # all observations

x2 <-  lm(speed ~ dist, data = cars[-1,]) # without first obs

x <- model.matrix(speed ~ dist) # x matrix

h <- diag(x%*%solve(crossprod(x))%*%t(x)) # hat values

num_dffits <- x1$fitted.values[1] - x2$fitted.values[1] #Numerator

denom_dffits <- sqrt(anova(x2)$`Mean Sq`[2]*h[1]) #Denominator

df_fits <- num_dffits/denom_dffits #DFFITS

dffits(x1)[1] # DFFITS function

In the numerator , it is the difference between the first fitted value with all observations and the first fitted values with out the first observation. In the denominator, it is the square root of variance of fitted value with out the first observation. — Karthik Shanmukha, Jun 27 '17 at 11:03

Zheyuan Li · Accepted Answer · 2017-06-27T16:05:15.470

Your numerator is wrong. As you have removed first datum from the second model, corresponding predicted value is not in fitted(x2). We need to use predict(x2, cars[1, ]) in place of fitted(x2)[1].

Hat values can be efficiently computed by

h <- rowSums(qr.Q(x1$qr) ^ 2)

or using its R wrapper function

h <- hat(x1$qr, FALSE)

R also has a generic function for getting hat values, too:

h <- lm.influence(x1, FALSE)$hat

or its wrapper function

h <- hatvalues(x1)

You also don't have to call anova to get MSE:

c(crossprod(x2$residuals)) / x2$df.residual

Calculation of DFFITS as diagnostic for Leverage and Influence in regression

1 Answers1

Linked