-1

I have a dataframe which looks like this

id year
1  2000
2  2000
1  2001
3  2001
4  2002
5  2002
6  2002
5  2003
6  2003
4  2004
5  2004

I want to progressively count how many consecutive years ids appear in my dataframe. In other words I want to get

id year count
1  2000 1
2  2000 1
1  2001 2
3  2001 1
4  2002 1
5  2002 1
6  2002 1
5  2003 2
6  2003 2
4  2004 1
5  2004 3

Do you have any suggestion? Many thanks, Marco

Marco Mello
  • 175
  • 1
  • 9
  • Try `with(df, ave(year, id, FUN = seq_along))` – Sotos Nov 22 '19 at 12:11
  • Why is `4 2004 1` correct? Shouldn't be `4 2004 2`? – Vitali Avagyan Nov 22 '19 at 12:11
  • 1
    This is an ```rle``` type answer. It's probably a duplicate bit not of the selected duplicate – Cole Nov 22 '19 at 12:14
  • @VitaliAvagyan No, it should be 1, since the previous observation was not in the previous year (i.e. 2003), rather two years before (i.e. 2002). I want to count obs as consecutive only if the id appears in two subsequent years indeed. – Marco Mello Nov 22 '19 at 12:16
  • 1
    Try ```with(DF, ave(year, id, FUN = function(x) {print(diff(x) != 1); ave(x, c(0, cumsum(diff(x)!= 1)), FUN = seq_along)}))``` – Cole Nov 22 '19 at 12:43

2 Answers2

1

Here's a base answer:

with(DF, ave(year, id, FUN = function(x) ave(x, c(0, cumsum(diff(x)!= 1)), FUN = seq_along)))

answer:

library(data.table)
setDT(DF)
DF[, 
   count := rowid(rleid(c(0, cumsum(diff(year) != 1)))),
   by = id][]

And :

library(dplyr)

DF %>% 
  group_by(id) %>%
  group_by(rle_id = c(0, cumsum(diff(year) != 1)), add = T)%>%
  mutate(count = row_number())%>%
  ungroup()

Data:

DF <- 
  read.table(text = 'id year
1  2000
2  2000
1  2001
3  2001
4  2002
5  2002
6  2002
5  2003
6  2003
4  2004
5  2004', header = T)
Cole
  • 11,130
  • 1
  • 9
  • 24
0

This answer to a very similar question makes for a good solution.

https://stackoverflow.com/a/52820446/2862791


library(tibble)
library(dplyr)


cumcount <- function(x){

        #' Credit to SO user `Gaurav Bansal``
        cumcount <- numeric(length(x))
        names(cumcount) <- x

        for(i in 1:length(x)){
                cumcount[i] <- sum(x[1:i]==x[i])
        }

        return(cumcount)
}


t <- tibble(
        id = c(1, 2, 1, 3, 4, 5, 6, 5, 6, 4, 5), 
        year = c(2000, 2000, 2001, 2001, 2002, 2002, 
                 2002, 2003, 2003, 2004, 2004)

)


t %>% 
        group_by(id) %>%
        mutate(count = cumcount(id))

     id  year count
   <dbl> <dbl> <dbl>
 1     1  2000     1
 2     2  2000     1
 3     1  2001     2
 4     3  2001     1
 5     4  2002     1
 6     5  2002     1
 7     6  2002     1
 8     5  2003     2
 9     6  2003     2
10     4  2004     2
11     5  2004     3

JFlynn
  • 344
  • 2
  • 8
  • thank you for your answer. However id 4 in year 2004 should be 1, since the previous observation of id==4 was not in the previous year (i.e. 2003), rather two years before (i.e. 2002). I want to count obs as consecutive only if the id appears in two subsequent years indeed. – Marco Mello Nov 22 '19 at 12:18