1

I currently have the below data frame df with the regression equation

regression_eq <- -0.09999975 * df$x + 1.999999 * df$y and am trying to determine the leverage points by hand. I have now reviewed many sources and am still stuck. I understand that typically you would use the lm function then develop the diagnostics plots with R but was hoping to do this by hand as I developed results separately.

        response        x             y       xx            xy           yy      xxx
1 -0.1999981 2.000000 -4.794927e-09 4.000000 -9.589855e-09 2.299133e-17 8.000000
2 -0.2796748 1.997601 -3.995733e-02 3.990411 -7.981882e-02 1.596588e-03 7.971252
3 -0.3590789 1.994407 -7.981885e-02 3.977661 -1.591913e-01 6.371049e-03 7.933076
4 -0.4381798 1.990421 -1.195688e-01 3.961775 -2.379922e-01 1.429669e-02 7.885600
5 -0.5169470 1.985645 -1.591913e-01 3.942786 -3.160975e-01 2.534188e-02 7.828973
6 -0.5953499 1.980083 -1.986710e-01 3.920729 -3.933850e-01 3.947016e-02 7.763370

If it is possible to convert this manually in R please let me know as I have found results that I am unsure of thus far and want to clarify if they are correct.

Thanks in advance.

AW27
  • 481
  • 3
  • 15
  • Hi AW27, I am not a linear regression guru, but what I recall is that in general outliers "pull" the regression towards them having large residuals. Thus, `points with extreme X values have high leverage`. High leverage points are thus outside the majority of the other x-values. Thus, you can determine with this is about determining outliers within certain intervals. Ditto for the residuals. High leverage and high residual will pull the regression more heavily. – Ray Nov 07 '21 at 15:17
  • Hi Ray, thanks I think you're right, but I was also interested in manually developing the plots similar to the `plot` function in `R` using manually calculated Leverage – AW27 Nov 07 '21 at 15:27

0 Answers0