1

I'm trying to identify runs of consecutive observations, group them and reshape so that start and end of each run occupy a column. Visually:

Example

## REPRODUCIBLE EXAMPLE
> dput(example)
structure(list(id = c(123, 123, 123, 123, 123, 123, 123, 123, 
234, 234, 234), date = structure(c(1398816000, 1398902400, 1398988800, 
1399075200, 1399161600, 1350777600, 1350864000, 1350950400, 1470009600, 
1470096000, 1470182400), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    event = structure(c(1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 
    1L), .Label = c("0", "1"), class = "factor")), row.names = c(NA, 
-11L), .Names = c("id", "date", "event"), class = c("tbl_df", 
"tbl", "data.frame"))

## GLIMPSE DATA
> dplyr::glimpse(example)
Observations: 11
Variables: 3
$ id    <dbl> 123, 123, 123, 123, 123, 123, 123, 123, 234, 234, 234
$ date  <dttm> 2014-04-30, 2014-05-01, 2014-05-02, 2014-05-03, 2014-05-04, 2012-10-21, 2012-10-22, 2012-10-23, 2016-08-01, 2016-08-02, 2016-08-03
$ event <fctr> 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0

I've broken down the approach as follows:

  1. group data by id
  2. rle to identify runs of consecutive observations within id (e.g. rle(example$event > 0))
  3. reshape from long to wide where min(date) and max(date) (within runs) become columns

I'm not sure how to proceed. The data.table solution to a similar question was close, but I wasn't able to re-purpose it.

Community
  • 1
  • 1
Thomas Speidel
  • 1,369
  • 1
  • 14
  • 26
  • Group by id, sort by date, get min date for event.start, get max date for event.stop. – zx8754 Oct 05 '16 at 15:22
  • Oh, do you mean there will be more than 1 start end events per 1 id? – zx8754 Oct 05 '16 at 15:23
  • @zx8754 **Yes**: there will be multiple runs per ID, hence multiple starts and end dates. My minimal example did not capture that. – Thomas Speidel Oct 05 '16 at 15:27
  • Please update your example data, make is easier to test for us. – zx8754 Oct 05 '16 at 15:37
  • 1
    OK, here is a start see [this post](http://stackoverflow.com/questions/39511614), the idea is to create group on id, then create group ids using events column using linked post method. Then group by id and new group, min max date... – zx8754 Oct 05 '16 at 15:45
  • 1
    Updated reproducible example to reflect multiple runs per ID – Thomas Speidel Oct 05 '16 at 15:45

2 Answers2

1

Stealing the idea from the other post:

df1 %>% 
  mutate(eventGroup = data.table::rleid(event)) %>% 
  filter(event == 1) %>% 
  group_by(id, eventGroup) %>% 
  summarise(start = min(date),
            end = max(date))

#      id eventGroup      start        end
# 1   123          2 2014-05-01 2014-05-03
# 2   123          4 2012-10-22 2012-10-22
# 3   234          6 2016-08-02 2016-08-02
Community
  • 1
  • 1
zx8754
  • 52,746
  • 12
  • 114
  • 209
1

Here's another option:

library(data.table)
setDT(ex)[,rl:=rleid(event),by=id][event=="1",.(start=min(date),stop=max(date)),by="id,rl"][,rl:=NULL][]
#     id      start       stop
# 1: 123 2014-05-01 2014-05-03
# 2: 123 2012-10-22 2012-10-22
# 3: 234 2016-08-02 2016-08-02
lukeA
  • 53,097
  • 5
  • 97
  • 100