I'm working on summer time series of drought period data and trying to identify individual periods. My problem is that the code I'm currently using does not recognize when a year changes so it assigns the same id for the end of summer and the beginning of the next summer.
Here's a simplified version of the data I have.
myData <- tibble(series = rep("FS",21),
date = c("2016-10-26","2016-10-27","2016-10-28","2016-10-29","2016-10-30","2016-10-31","2017-05-01","2017-05-02","2017-05-03","2017-05-04","2017-05-05","2017-05-06","2017-05-07","2017-05-08","2017-05-09","2017-05-10","2017-05-11","2017-05-12","2017-05-13","2017-05-14","2017-05-15"),
year = c(rep(2016,6),rep(2017,15)),
day_status = c(rep("normal",3),rep("drought",16),rep("normal",2)))
> myData
# A tibble: 21 x 4
series date year day_status
<chr> <chr> <dbl> <chr>
1 FS 2016-10-26 2016 normal
2 FS 2016-10-27 2016 normal
3 FS 2016-10-28 2016 normal
4 FS 2016-10-29 2016 drought
5 FS 2016-10-30 2016 drought
6 FS 2016-10-31 2016 drought
7 FS 2017-05-01 2017 drought
8 FS 2017-05-02 2017 drought
9 FS 2017-05-03 2017 drought
10 FS 2017-05-04 2017 drought
# ... with 11 more rows
The result I'm looking for is something like this
> myData2
# A tibble: 21 x 5
series date year day_status group
<chr> <chr> <dbl> <chr> <dbl>
1 FS 2016-10-26 2016 normal 1
2 FS 2016-10-27 2016 normal 1
3 FS 2016-10-28 2016 normal 1
4 FS 2016-10-29 2016 drought 2
5 FS 2016-10-30 2016 drought 2
6 FS 2016-10-31 2016 drought 2
7 FS 2017-05-01 2017 drought 3
8 FS 2017-05-02 2017 drought 3
9 FS 2017-05-03 2017 drought 3
10 FS 2017-05-04 2017 drought 3
# ... with 11 more rows
The code I have been using is myData$group <- with(myData, rep(seq_along(z<-rle(myData$day_status)$lengths),z))
but it assigns droughts from October and May as the same drought which is not the case.
I tried then use dplyr and group_by
to make the function run for one year at the time
group_by(year) %>%
mutate(group = rep(seq_along(z<-rle(myData$day_status)$lengths),z)) %>%
ungroup() %>%
{. ->> myData}
but this gives an error Error: Column group
must be length 6 (the group size) or one, not 21. I gathered this has something to do with how the group_by
works, but I don't fully understand what is the problem.
Any help is greatly appreciated!