Identify Runs of Consecutive Obs by Group and Reshape

Question

I'm trying to identify runs of consecutive observations, group them and reshape so that start and end of each run occupy a column. Visually:

## REPRODUCIBLE EXAMPLE
> dput(example)
structure(list(id = c(123, 123, 123, 123, 123, 123, 123, 123, 
234, 234, 234), date = structure(c(1398816000, 1398902400, 1398988800, 
1399075200, 1399161600, 1350777600, 1350864000, 1350950400, 1470009600, 
1470096000, 1470182400), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    event = structure(c(1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 
    1L), .Label = c("0", "1"), class = "factor")), row.names = c(NA, 
-11L), .Names = c("id", "date", "event"), class = c("tbl_df", 
"tbl", "data.frame"))

## GLIMPSE DATA
> dplyr::glimpse(example)
Observations: 11
Variables: 3
$ id    <dbl> 123, 123, 123, 123, 123, 123, 123, 123, 234, 234, 234
$ date  <dttm> 2014-04-30, 2014-05-01, 2014-05-02, 2014-05-03, 2014-05-04, 2012-10-21, 2012-10-22, 2012-10-23, 2016-08-01, 2016-08-02, 2016-08-03
$ event <fctr> 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0

I've broken down the approach as follows:

group data by id
rle to identify runs of consecutive observations within id (e.g. rle(example$event > 0))
reshape from long to wide where min(date) and max(date) (within runs) become columns

I'm not sure how to proceed. The data.table solution to a similar question was close, but I wasn't able to re-purpose it.

Group by id, sort by date, get min date for event.start, get max date for event.stop. — zx8754, Oct 05 '16 at 15:22
Oh, do you mean there will be more than 1 start end events per 1 id? — zx8754, Oct 05 '16 at 15:23
@zx8754 **Yes**: there will be multiple runs per ID, hence multiple starts and end dates. My minimal example did not capture that. — Thomas Speidel, Oct 05 '16 at 15:27
Please update your example data, make is easier to test for us. — zx8754, Oct 05 '16 at 15:37
OK, here is a start see [this post](http://stackoverflow.com/questions/39511614), the idea is to create group on id, then create group ids using events column using linked post method. Then group by id and new group, min max date... — zx8754, Oct 05 '16 at 15:45
Updated reproducible example to reflect multiple runs per ID — Thomas Speidel, Oct 05 '16 at 15:45

score 1 · Accepted Answer · edited May 23 '17 at 12:33

1

Stealing the idea from the other post:

df1 %>% 
  mutate(eventGroup = data.table::rleid(event)) %>% 
  filter(event == 1) %>% 
  group_by(id, eventGroup) %>% 
  summarise(start = min(date),
            end = max(date))

#      id eventGroup      start        end
# 1   123          2 2014-05-01 2014-05-03
# 2   123          4 2012-10-22 2012-10-22
# 3   234          6 2016-08-02 2016-08-02

edited May 23 '17 at 12:33

Community

1
1

answered Oct 05 '16 at 15:52

zx8754

52,746
12
114
209

Thanks! This works great and I personally like the tidyverse approach with a bit of data.table! – Thomas Speidel Oct 05 '16 at 19:32
@ThomasSpeidel some might find this ugly, better to stick with one ecosystem. – zx8754 Oct 06 '16 at 13:11

score 1 · Answer 2 · answered Oct 05 '16 at 16:06

Here's another option:

library(data.table)
setDT(ex)[,rl:=rleid(event),by=id][event=="1",.(start=min(date),stop=max(date)),by="id,rl"][,rl:=NULL][]
#     id      start       stop
# 1: 123 2014-05-01 2014-05-03
# 2: 123 2012-10-22 2012-10-22
# 3: 234 2016-08-02 2016-08-02

Identify Runs of Consecutive Obs by Group and Reshape

2 Answers2