0

Summary : I have fitted my data with a regression line constrained through the first point. This least-squares regression (red line) seems unreliable because of its low slope value.

Here are my data :

library(dplyr); library(stats); library(ggplot2)
data = data.frame(y = c(3.86, 5.22, 7.77, 8.62, 6.2, 4.39, 3.76, 3.31, 3.26), x = c(0:8))

I want to fit y on x with a linear regression through the first point.

intercept <- data$y[1]

lm_model = lm( y-intercept~ x + 0, data = data) 

I plot the model (regression line in red) :

data_1 <- data %>% mutate(y_fitted = lm_model$fitted.values + intercept) 

ggplot(data = data_1, aes(x = x)) + 
  geom_point(aes(y = y), color = "black") + 
  geom_path(aes(y = y_fitted), color = "red")

enter image description here

However, the linear regression seems unreliable with a too low slope.

If I multiply the slope by 2.5, I get this new regression that seems more reliable (in orange) :

data_1 <- data_1 %>% mutate(y_fitted_bis = lm_model$fitted.values*2.5 + intercept)

ggplot(data = data_1, aes(x = x)) + 
  geom_point(aes(y = y), color = "black") + 
  geom_path(aes(y = y_fitted), color = "red") +
  geom_path(aes(y = y_fitted_bis), color = "orange")

enter image description here

Why does the red linear regression seem unreliable?

Loulou
  • 703
  • 5
  • 17
  • Thanks Dario. Unfortunately, that does not answer my question. Using this method provides the same slope value that is likely unreliable. – Loulou Oct 05 '21 at 12:31
  • 1
    I strongly suspect that this question should be "why is my intuition wrong"? rather than "why is my line wrong"? I suspect it has something to do with *leverage* - because you are fixing the first point, small changes to the slope with make large changes in the SSQ associated with the rightmost points in the data set ... – Ben Bolker Oct 05 '21 at 13:36
  • Ben Bolker, you're right. I have modified my post – Loulou Oct 05 '21 at 14:22

1 Answers1

2

Your model is ok (although possibly the first point should be omitted since it is fixed at a 0 residual) but the plotting command has to transform it back to the original coordinates.

plot(y ~ x, data)
abline(a = intercept, b = coef(lm_model))

Alternately, use a much larger weight for the first point to force it through that point.

fm <- lm(y ~ x, data, weights = replace(0*x+1, 1, 10^5))

plot(y ~ x, data)
abline(fm)

screenshot

Added

Another possibility is to use two lines rather than one.

# starting values
fm0.1 <- lm(y ~ x + offset(rep(intercept, length = length(y))) + 0,
  data, subset = seq(4))
fm0.2 <- lm(y ~ x, data, subset = -seq(4))
st <- list(b0 = coef(fm0.1)[[1]], a = coef(fm0.2)[[1]], b = coef(fm0.2)[[2]])

f <- function(x, b0, a, b) pmin(b0 * x + intercept, a + b * x)
fm2 <- nls(y ~ f(x, b0, a, b), data, start = st)

plot(y ~ x, data)
rng <- range(data$x)
with(as.list(coef(fm2)), 
  curve(f(x, b0, a, b), from = rng[1], to = rng[2], add = TRUE))

screenshot

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341