Count consecutive observations within grouped data in R

Question

I have a dataframe which looks like this

I want to progressively count how many consecutive years ids appear in my dataframe. In other words I want to get

id year count
1  2000 1
2  2000 1
1  2001 2
3  2001 1
4  2002 1
5  2002 1
6  2002 1
5  2003 2
6  2003 2
4  2004 1
5  2004 3

Do you have any suggestion? Many thanks, Marco

This is an ```rle``` type answer. It's probably a duplicate bit not of the selected duplicate — Cole, Nov 22 '19 at 12:14
@VitaliAvagyan No, it should be 1, since the previous observation was not in the previous year (i.e. 2003), rather two years before (i.e. 2002). I want to count obs as consecutive only if the id appears in two subsequent years indeed. — Marco Mello, Nov 22 '19 at 12:16
Try ```with(DF, ave(year, id, FUN = function(x) {print(diff(x) != 1); ave(x, c(0, cumsum(diff(x)!= 1)), FUN = seq_along)}))``` — Cole, Nov 22 '19 at 12:43

score 1 · Accepted Answer · answered Nov 22 '19 at 12:51

Here's a base answer:

with(DF, ave(year, id, FUN = function(x) ave(x, c(0, cumsum(diff(x)!= 1)), FUN = seq_along)))

data.table answer:

library(data.table)
setDT(DF)
DF[, 
   count := rowid(rleid(c(0, cumsum(diff(year) != 1)))),
   by = id][]

And dplyr:

library(dplyr)

DF %>% 
  group_by(id) %>%
  group_by(rle_id = c(0, cumsum(diff(year) != 1)), add = T)%>%
  mutate(count = row_number())%>%
  ungroup()

Data:

DF <- 
  read.table(text = 'id year
1  2000
2  2000
1  2001
3  2001
4  2002
5  2002
6  2002
5  2003
6  2003
4  2004
5  2004', header = T)

score 0 · Answer 2 · answered Nov 22 '19 at 12:08

This answer to a very similar question makes for a good solution.

https://stackoverflow.com/a/52820446/2862791


library(tibble)
library(dplyr)


cumcount <- function(x){

        #' Credit to SO user `Gaurav Bansal``
        cumcount <- numeric(length(x))
        names(cumcount) <- x

        for(i in 1:length(x)){
                cumcount[i] <- sum(x[1:i]==x[i])
        }

        return(cumcount)
}


t <- tibble(
        id = c(1, 2, 1, 3, 4, 5, 6, 5, 6, 4, 5), 
        year = c(2000, 2000, 2001, 2001, 2002, 2002, 
                 2002, 2003, 2003, 2004, 2004)

)


t %>% 
        group_by(id) %>%
        mutate(count = cumcount(id))

     id  year count
   <dbl> <dbl> <dbl>
 1     1  2000     1
 2     2  2000     1
 3     1  2001     2
 4     3  2001     1
 5     4  2002     1
 6     5  2002     1
 7     6  2002     1
 8     5  2003     2
 9     6  2003     2
10     4  2004     2
11     5  2004     3

thank you for your answer. However id 4 in year 2004 should be 1, since the previous observation of id==4 was not in the previous year (i.e. 2003), rather two years before (i.e. 2002). I want to count obs as consecutive only if the id appears in two subsequent years indeed. — Marco Mello, Nov 22 '19 at 12:18

Count consecutive observations within grouped data in R

2 Answers2