0

I have daily rainfall data which I have converted to yearwise cumulative value using following code

library(seas)
library(data.table)
library(ggplot2)

#Loading data
data(mscdata)
dat <- (mksub(mscdata, id=1108447))
dat$julian.date <- as.numeric(format(dat$date, "%j"))
DT <- data.table(dat)
DT[, Cum.Sum := cumsum(rain), by=list(year)]

df <- cbind.data.frame(day=dat$julian.date,cumulative=DT$Cum.Sum)

Then I want to apply segmented regression year-wise to have year-wise breakpoints. I could able to do it for single year like

library("segmented")
x <- subset(dat,year=="1984")$julian.date
y <- subset(DT,year=="1984")$Cum.Sum
fit.lm<-lm(y~x)
segmented(fit.lm, seg.Z = ~ x, npsi=3)

I have used npsi = 3 to have 3 breakpoints. Now how to dinimically apply it year-wise segmented regression and have the estimated breakpoints?

UseR10085
  • 7,120
  • 3
  • 24
  • 54

2 Answers2

1

Here's a short script to come out with a customised function so that you can run the different yearwise regressions.

## using tidyverse processes instead of mixing and matching with other data manipulation packages 
library(tidyverse); library(segmented); library(seas)

## get mscdata from "seas" packages
data(mscdata)
dat <- (mksub(mscdata, id=1108447))

## generate cumulative sum of rain by year
d2 <- dat %>% group_by(year) %>% mutate(rain_cs = cumsum(rain)) %>% ungroup

## write a custom function

segmentedlm <- function(data, year){
  subset.df <- data %>% filter(year == year)
  fit.lm <- lm(rain_cs ~ julian.date, subset.df)
  segmented(fit.lm, seg.Z = ~ julian.date, npsi=3)
}

# run the customised function for 1975 data
segmentedlm(d2, "1975") %>% plot(., main="1975")

enter image description here

segmentedlm(d2, "1984") %>% plot(., main = "1984")

enter image description here

To output the summary of segmented linear models of multiple years into a text file:

sink("output.txt")
lapply(c("1975", "1984"), function(x) segmentedlm(d2, x))
sink()

You can change the argument for lapply to input all the years.

Adam Quek
  • 6,973
  • 1
  • 17
  • 23
  • Plotting is ok, but can it be possible to run for all the years at a time and have the breakpoints in .csv file. – UseR10085 Jun 25 '20 at 14:11
  • what sort of output do you want on a csv file? – Adam Quek Jun 25 '20 at 14:20
  • If you run `segmentedlm(d2, "1975")` this only, then I want to write this ```psi1.julian.date psi2.julian.date psi3.julian.date 81.5 152.0 285.7``` for all the years one below the another with row names be years. – UseR10085 Jun 25 '20 at 14:27
  • Amended the answer above so that you can write the console output into a text file – Adam Quek Jun 25 '20 at 14:33
  • An alternative would be using `broom::tidy` to the console output for each segmented linear model into a table. But combining them all into a single csv will be mighty confusing. – Adam Quek Jun 25 '20 at 14:34
  • Your output is having 5 columns, but it should have 3 columns only for the three breakpoints for each year. With every run output changes, can it be fixed with `set.seed`?. – UseR10085 Jun 25 '20 at 14:38
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/216644/discussion-between-bappa-das-and-adam-quek). – UseR10085 Jun 25 '20 at 14:52
1

You can store the lm object in a list and apply segmented for each year.

library(tidyverse)

data <- DT %>%
         group_by(year) %>%
         summarise(fit.lm = list(lm(Cum.Sum~julian.date)), 
                   julian.date1 = list(julian.date)) %>%
         mutate(out = map2(fit.lm, julian.date1, function(x, julian.date) 
                       data.frame(segmented::segmented(x, 
                                  seg.Z = ~julian.date, npsi=3)$psi))) %>%
         unnest_wider(out) %>%
         unnest(cols = c(Initial, Est., St.Err)) %>%
         dplyr::select(-fit.lm, -julian.date1)

# A tibble: 90 x 4
#    year Initial  Est. St.Err
#   <int>   <dbl> <dbl>  <dbl>
# 1  1975    84.8  68.3  1.44 
# 2  1975   168.  167.   9.31 
# 3  1975   282.  281.   0.917
# 4  1976    84.8  68.3  1.44 
# 5  1976   168.  167.   9.33 
# 6  1976   282.  281.   0.913
# 7  1977    84.8  68.3  1.44 
# 8  1977   168.  167.   9.32 
# 9  1977   282.  281.   0.913
#10  1978    84.8  68.3  1.44 
# … with 80 more rows
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • How to make the list output to dataframe? Just I want to have the breakpoints (psi) as the output with year as row name. You can use `segmented(x, seg.Z = ~julian.date, npsi=3)$psi` – UseR10085 Jun 26 '20 at 10:20
  • `segmented(fit.lm, seg.Z = ~ x, npsi=3)$psi` returns a 3 X 3 matrix so there will be 9 values. So do you want to have 9 columns? – Ronak Shah Jun 26 '20 at 10:29
  • First column is the initial breakpoint, 2nd column is the final estimate of the breakpoint and the 3rd is standard errors. If we can have the 2nd column, it will serve the purpose. All in the output will be the best with respective names. – UseR10085 Jun 26 '20 at 10:39
  • @BappaDas but then every year would have. 3 rows right? – Ronak Shah Jun 26 '20 at 11:52
  • Yes, if you keep only the second column of the matrix. – UseR10085 Jun 26 '20 at 11:52
  • It's the correct one but if you see the output, the Est. are same for all the years. It is not possible. I think some problem is there in the `segmented` package itself or any other problem I don't know. It was the same case with the answer of @Adam Quek also. – UseR10085 Jun 26 '20 at 12:09
  • 1
    Yeah...I see that. I am not sure how that works. If you take output till `summarise` and store it in `data`, we can see there are different models `data$fit.lm[[1]]` , `data$fit.lm[[2]]`, but if you put them in `segmented` they generate the same output. `segmented::segmented(data$fit.lm[[1]], npsi=3)$psi` , `segmented::segmented(data$fit.lm[[2]], npsi=3)$psi` – Ronak Shah Jun 26 '20 at 15:15
  • Under such situations, what can be done? Should I accept the answer? In the meantime you can visit [this](https://stackoverflow.com/questions/62129863/how-to-melt-pairwise-wilcox-test-output-using-dplyr), I think you will be able to answer it. – UseR10085 Jun 26 '20 at 17:13