1

I have defined two variables x and y. I want to regress y on x, but the sum of residuals using the lm is non-zero

Here are the variables:

x<-c(1,10,6,4,3,5,8,9,0,3,1,1,12,6,3,11,15,5,10,4)    
y<-c(2,3,6,7,8,4,2,1,0,0,6,1,3,5,2,4,1,0,1,9)
gh<-lm(y~x)

sum(gh$residuals)
# [1] 4.718448e-16 

I don't understand why the sum of residuals is non-zero. It should be zero by the procedure of OLS.

Thanks

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
user3635913
  • 11
  • 1
  • 3
  • 4
    4e-16 is basically zero considering the floating point accuracy ([somehow related post](http://stackoverflow.com/questions/9508518/why-are-these-numbers-not-equal)) – digEmAll Jun 02 '14 at 10:10
  • For fun, have you have tried to do `0.1+0.1+0.1 == 3` ? Well, it returns `FALSE` :) – digEmAll Jun 02 '14 at 10:18

1 Answers1

3

Floating-point numbers have limited precision. Only a finite set of real numbers can be represented exactly as 32- or 64-bit floats; the rest are approximated by rounding them to the nearest number that can be represented exactly.

This means that, while mathematically the residuals should sum up to zero, in computer representation they might not.

I highly recommend What Every Computer Scientist Should Know About Floating-Point Arithmetic.

NPE
  • 486,780
  • 108
  • 951
  • 1,012