Ranking duplicated rows in R

Question

I am trying to create an additional variable (new variable-> flag) that will number the repetition of observation in my variable starting from 0.

dataset <- data.frame(id = c(1,1,1,2,2,4,6,6,6,7,7,7,7,8))

intended results will look like:

Thank You!

score 2 · Answer 1 · answered Dec 15 '21 at 00:50

You may try

dataset$flag <- unlist(sapply(rle(dataset$id)$length, function(x) seq(1,x)-1))

   id flag
1   1    0
2   1    1
3   1    2
4   2    0
5   2    1
6   4    0
7   6    0
8   6    1
9   6    2
10  7    0
11  7    1
12  7    2
13  7    3
14  8    0

Onyambu · Answer 2 · 2021-12-15T00:58:26.833

data.table:

library(data.table)
setDT(dataset)[, flag := rowid(id) - 1]
dataset
    id flag
 1:  1    0
 2:  1    1
 3:  1    2
 4:  2    0
 5:  2    1
 6:  4    0
 7:  6    0
 8:  6    1
 9:  6    2
10:  7    0
11:  7    1
12:  7    2
13:  7    3
14:  8    0

Base R:

dataset$flag = sequence(rle(dataset$id)$lengths) - 1 
dataset
   id flag
1   1    0
2   1    1
3   1    2
4   2    0
5   2    1
6   4    0
7   6    0
8   6    1
9   6    2
10  7    0
11  7    1
12  7    2
13  7    3
14  8    0

score 1 · Answer 3 · answered Dec 15 '21 at 01:06

Another base option:

transform(dataset,
          flag = Reduce(function(x, y) y * x + y, duplicated(id), accumulate = TRUE))

   id flag
1   1    0
2   1    1
3   1    2
4   2    0
5   2    1
6   4    0
7   6    0
8   6    1
9   6    2
10  7    0
11  7    1
12  7    2
13  7    3
14  8    0

score 1 · Answer 4 · answered Dec 15 '21 at 01:08

dplyr -

library(dplyr)

dataset %>% group_by(id) %>% mutate(flag = row_number() - 1)

#      id  flag
#   <dbl> <dbl>
# 1     1     0
# 2     1     1
# 3     1     2
# 4     2     0
# 5     2     1
# 6     4     0
# 7     6     0
# 8     6     1
# 9     6     2
#10     7     0
#11     7     1
#12     7     2
#13     7     3
#14     8     0

Base R with similar logic

transform(dataset, flag = ave(id, id, FUN = seq_along) - 1)

matheus · Answer 5 · 2021-12-15T01:41:49.970

0

another way to reach what you expect but writing a little more

x <- dataset %>%
     group_by(id) %>% 
     summarise(nreg=n())

 df <- data.frame()

 for(i in 1:nrow(x)){

        flag <- data.frame(id = rep( x$id[i], x$nreg[i] ),
                           flag = seq(0, x$nreg [i] -1 )
                           )


        df <- rbind(df, flag)

   }

edited Dec 15 '21 at 01:41

answered Dec 15 '21 at 01:32

matheus

11
4

Ranking duplicated rows in R

5 Answers5