I'm trying to identify runs of consecutive observations, group them and reshape so that start and end of each run occupy a column. Visually:
## REPRODUCIBLE EXAMPLE
> dput(example)
structure(list(id = c(123, 123, 123, 123, 123, 123, 123, 123,
234, 234, 234), date = structure(c(1398816000, 1398902400, 1398988800,
1399075200, 1399161600, 1350777600, 1350864000, 1350950400, 1470009600,
1470096000, 1470182400), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
event = structure(c(1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L,
1L), .Label = c("0", "1"), class = "factor")), row.names = c(NA,
-11L), .Names = c("id", "date", "event"), class = c("tbl_df",
"tbl", "data.frame"))
## GLIMPSE DATA
> dplyr::glimpse(example)
Observations: 11
Variables: 3
$ id <dbl> 123, 123, 123, 123, 123, 123, 123, 123, 234, 234, 234
$ date <dttm> 2014-04-30, 2014-05-01, 2014-05-02, 2014-05-03, 2014-05-04, 2012-10-21, 2012-10-22, 2012-10-23, 2016-08-01, 2016-08-02, 2016-08-03
$ event <fctr> 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0
I've broken down the approach as follows:
- group data by
id
rle
to identify runs of consecutive observations withinid
(e.g.rle(example$event > 0)
)- reshape from long to wide where min(date) and max(date) (within runs) become columns
I'm not sure how to proceed. The data.table
solution to a similar question was close, but I wasn't able to re-purpose it.