3

From my understanding, the significance of an intercept term (β0 in y = β0 + β1x + ɛ) in a given model is tested by comparison to zero (whereby, a non-significant β0 = 0, and a significant β0 ≠ 0).

If this is indeed the case, why, then, does a simple linear model in R with an intercept set (indirectly) to zero, yield a significant coefficient? Please see the attached example below:

x = c(-5:50)
y = c(-5:50)
plot(y~x) # Plotting the relationship between y and x, obviously passing through zero

enter image description here

summary(lm(y~x))
Call: lm(formula = y ~ x)

Residuals:
    Min         1Q     Median        3Q        Max
-1.638e-15 -5.034e-16 -1.994e-16  1.047e-16  3.016e-15

Coefficients:
             Estimate   Std. Error  t value   Pr(>|t|)
(Intercept) -3.798e-15  2.153e-16 -1.764e+01   <2e-16
x            1.000e+00  7.772e-18  1.287e+17   <2e-16
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 9.401e-16 on 54 degrees of freedom
Multiple R-squared:      1,     Adjusted R-squared:      1
F-statistic: 1.655e+34 on 1 and 54 DF,  p-value: < 2.2e-16

Warning message:
In summary.lm(lm(y ~ x)) : essentially perfect fit: summary may be 
unreliable

My additional question, then, is how does base R calculate the significance of an intercept coefficient (formula would be much appreciated)?

lovalery
  • 4,524
  • 3
  • 14
  • 28
  • 5
    I would direct your attention to the warning: *essentially perfect fit: summary may be unreliable*. When there is no variation, statistical methods to assess variability don't necessarily work. And it tells you that this is a problem. – Gregor Thomas Oct 18 '18 at 16:46
  • 2
    I don't know that it's a duplicate, but [Why are these numbers not equal](https://stackoverflow.com/q/9508518/903061) is good related reading. – Gregor Thomas Oct 18 '18 at 16:47
  • 1
    If you try with `y = x + rnorm(56)` you'll get an insignificant result (unless you're really unlucky) – IceCreamToucan Oct 18 '18 at 16:48
  • 3
    The standard error here is theoretically 0, which means the t value and thus p-value are actually undefined. – IceCreamToucan Oct 18 '18 at 16:51
  • 1
    I think it also has to do with integer and decimals in your vectors. Try this: `x1 = sample(1:2000, 1000); y1 = x1; x2 = rnorm(1000, 0, 5); y2 = x2; summary(lm(y1~x1)); summary(lm(y2~x2));` But, as a general rule, don't trust the output when there's a warning like "summary may be unreliable", as @Gregor mentioned above. – AntoniosK Oct 18 '18 at 16:51

0 Answers0