R - linear regression through the first point - why is my intuition wrong?

Question

Summary : I have fitted my data with a regression line constrained through the first point. This least-squares regression (red line) seems unreliable because of its low slope value.

Here are my data :

library(dplyr); library(stats); library(ggplot2)
data = data.frame(y = c(3.86, 5.22, 7.77, 8.62, 6.2, 4.39, 3.76, 3.31, 3.26), x = c(0:8))

I want to fit y on x with a linear regression through the first point.

intercept <- data$y[1]

lm_model = lm( y-intercept~ x + 0, data = data)

I plot the model (regression line in red) :

data_1 <- data %>% mutate(y_fitted = lm_model$fitted.values + intercept) 

ggplot(data = data_1, aes(x = x)) + 
  geom_point(aes(y = y), color = "black") + 
  geom_path(aes(y = y_fitted), color = "red")

However, the linear regression seems unreliable with a too low slope.

If I multiply the slope by 2.5, I get this new regression that seems more reliable (in orange) :

data_1 <- data_1 %>% mutate(y_fitted_bis = lm_model$fitted.values*2.5 + intercept)

ggplot(data = data_1, aes(x = x)) + 
  geom_point(aes(y = y), color = "black") + 
  geom_path(aes(y = y_fitted), color = "red") +
  geom_path(aes(y = y_fitted_bis), color = "orange")

Why does the red linear regression seem unreliable?

Thanks Dario. Unfortunately, that does not answer my question. Using this method provides the same slope value that is likely unreliable. — Loulou, Oct 05 '21 at 12:31
I strongly suspect that this question should be "why is my intuition wrong"? rather than "why is my line wrong"? I suspect it has something to do with *leverage* - because you are fixing the first point, small changes to the slope with make large changes in the SSQ associated with the rightmost points in the data set ... — Ben Bolker, Oct 05 '21 at 13:36

G. Grothendieck · Answer 1 · 2021-10-05T16:59:01.660

2

Your model is ok (although possibly the first point should be omitted since it is fixed at a 0 residual) but the plotting command has to transform it back to the original coordinates.

plot(y ~ x, data)
abline(a = intercept, b = coef(lm_model))

Alternately, use a much larger weight for the first point to force it through that point.

fm <- lm(y ~ x, data, weights = replace(0*x+1, 1, 10^5))

plot(y ~ x, data)
abline(fm)

Added

Another possibility is to use two lines rather than one.

# starting values
fm0.1 <- lm(y ~ x + offset(rep(intercept, length = length(y))) + 0,
  data, subset = seq(4))
fm0.2 <- lm(y ~ x, data, subset = -seq(4))
st <- list(b0 = coef(fm0.1)[[1]], a = coef(fm0.2)[[1]], b = coef(fm0.2)[[2]])

f <- function(x, b0, a, b) pmin(b0 * x + intercept, a + b * x)
fm2 <- nls(y ~ f(x, b0, a, b), data, start = st)

plot(y ~ x, data)
rng <- range(data$x)
with(as.list(coef(fm2)), 
  curve(f(x, b0, a, b), from = rng[1], to = rng[2], add = TRUE))

edited Oct 05 '21 at 16:59

answered Oct 05 '21 at 12:20

G. Grothendieck

254,981
17
203
341

Thanks G. Grothendieck. Unfortunately, using your methods returns the same low slope value – Loulou Oct 05 '21 at 12:34
OK. Have revised. – G. Grothendieck Oct 05 '21 at 12:45
Thanks for your answers and edits. However, this least squares regression seems unreliable because of its low slope value. – Loulou Oct 05 '21 at 13:00
1

Maybe it actually *is* low. You can always try a different model. Maybe quantreg::rq? – G. Grothendieck Oct 05 '21 at 13:25
You may be right – Loulou Oct 05 '21 at 14:19
See Added section for another approach. – G. Grothendieck Oct 05 '21 at 16:59

R - linear regression through the first point - why is my intuition wrong?

1 Answers1

Added