Segmented linear regression with discontinuous data

Question

I have a dataset that looks to be piecewise linear. I would like to perform a segmented linear regression in R. The issue is that there is a discontinuity at the breakpoint. By using some pieces of code from this question I managed to get something, but I am not satisfied.

Dataset

Here is a dummy dataset.

NB = 100
A1 = 2 # coeff for first part
A2 = 1 # coeff for second part
B1 = 0 # intercept for first part
B2 = 300 # intercept for second part
df = data.frame(n=1:NB)
df$n = sample(500, size=NB, replace=TRUE)
df$noise = sample(20, size=NB, replace=TRUE)-10
my_func <- function(n, noise) {
    if(n < 100) {
        return(A1*n+B1 + noise)
    }
    else {
        return(A2*n+B2 + noise)
    }
}
df$fn = mapply(my_func, df$n, df$noise)

Using `segmented` package

This is quite straightforward, we simply perform a classical linear regression and give it to segmented.

library(segmented)
library(ggplot2)
model_segmented = segmented(lm(fn~n, data=df), seg.Z = ~ n)
predict_segmented = data.frame(n = df$n, fn = broken.line(model_segmented)$fit)
ggplot(df, aes(x = n, y = fn)) +
    geom_point() + geom_line(data = predict_segmented, color = 'blue')

Gives:

Obviously, segmented expects the data to be continuous. It is not the case here, so the regression is not correct.

“Manual” method

This method is more tedious. First, we compute the break-point by trying all the possible break points and keeping the one which yields the lowest residual. Then, we add a new factor in the linear regression, which tells if the predictor variable is greater or lower than this breakpoint.

# Computation of the break-point
Break<-sort(unique(df$n))
Break<-Break[2:(length(Break)-1)]
d<-numeric(length(Break))
for (i in 1:length(Break)) {
    model_manual<-lm(fn~(n<Break[i])*n + (n>=Break[i])*n, data=df)
    d[i]<-summary(model_manual)[[6]]
}
breakpoint = Break[which.min(d)]

# Linear regression using this break-point
df$group = df$n >= breakpoint
model_manual<-lm(fn~n*group, data=df)
dat_pred = data.frame(n = df$n, fn = predict(model_manual, df))
ggplot(df, aes(x = n, y = fn)) +
    geom_point() +
    geom_line(data=dat_pred[dat_pred$n < breakpoint,], color = 'blue') +
    geom_line(data=dat_pred[dat_pred$n >= breakpoint,], color = 'blue')

Gives:

Here, the regression is great.

Question

Is there a better way to achieve this goal? Can the segmented package take discontinuous data, or is there a package that can do this?

My concern is that the second method is a bit long and not very readable.

RanchoDelRey · Answer 1 · 2019-12-21T05:02:53.627

3

After spending a tremendous amount of time digging, I believe the chngpt package is the way to go. It can do both continuous and discontinuous segmented regressions. Link here: https://cran.r-project.org/web/packages/chngpt/vignettes/chngpt-vignette.pdf

edited Dec 21 '19 at 05:02

answered Dec 10 '19 at 04:38

RanchoDelRey

31
4

Thanks for pointing out the chngpt package. I think this answer should be accepted. chngpt can do this using the "stegmented" model. Note the "t". This is not a typo. The model you want is called "stegmented" as opposed to "segmented". I emphasize it because it took me a bit of time to figure it out. – stacksia Jun 22 '20 at 22:41

score 2 · Answer 2 · answered Sep 07 '18 at 23:43

2

strucchange will detect the breakpoint using statistically valid methods. Then, you can fit each piece with whatever model you want. For example, with a seasonal time series you can apply separate ARIMA models to each segment.

answered Sep 07 '18 at 23:43

Dave B.

21
2

Segmented linear regression with discontinuous data

Dataset

Using segmented package

“Manual” method

Question

2 Answers2

Using `segmented` package