2

I have an R dataframe like the one below which contains each user and a list of days they were active.

User    date_active          
1       2018-08-27  
1       2018-08-28
1       2018-08-29 
1       2018-08-31
1       2018-09-01
2       2018-07-05
2       2018-07-06
2       2018-07-07
2       2018-07-08
2       2018-07-10
2       2018-07-11
2       2018-07-12
2       2018-07-20

I would like to count all the dates up until a gap, in which the count restarts. My expected output would look like the below table for the above data.

User date_active          
1    3  
1    2
2    4 
2    3
2    1

My initial guess was to use the rank function or difftime but am unsure how to get the result. Any help would be most appreciated!

quicklegit
  • 137
  • 1
  • 6
  • 2
    I think this should get you going: [How to partition a vector into groups of regular, consecutive sequences?](https://stackoverflow.com/questions/5222061/how-to-partition-a-vector-into-groups-of-regular-consecutive-sequences). It's a good, canonical post on how to create a grouping variable using the `cumsum(diff(...` idiom. – Henrik Oct 24 '18 at 09:24
  • Ah yes, my bad! – quicklegit Oct 24 '18 at 10:28

1 Answers1

6

With dplyr we could group_by User and number of days when difference of days is not equal to 1. We count the number of rows in each group then.

library(dplyr)

df %>%
  group_by(User, days = c(0, cumsum(diff(date_active) != 1))) %>%
  summarise(date_active = n()) %>%
  select(-days)

#   User date_active
#  <int>       <int>
#1     1           3
#2     1           2
#3     2           4
#4     2           3
#5     2           1

This is assuming that your date_active is of class Date or else convert it to Date first by

df$date_active <- as.Date(df$date_active)

Creating a grouping variable is the key here. See

c(0, cumsum(diff(df$date_active) != 1))
# [1] 0 0 0 1 1 2 2 2 2 3 3 3 4
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213