Adding number if next value the same r

Question

I'm facing the following problem in R. I have a dataframe with values identifing a customer. There is a column with User ID. I need to add another column with a counter what is the occurence number of that particular customer in the data. The dataframe is sorted by User ID. So i have something like that:

> niekonwersyjne[c(57:62,72:77),1]
                     User_ID 
AMsySZa--1Og4WwseZJKRyABTWdh       
AMsySZa--1Og4WwseZJKRyABTWdh       
AMsySZa--1Og4WwseZJKRyABTWdh       
AMsySZa--1Og4WwseZJKRyABTWdh       
AMsySZa--1Og4WwseZJKRyABTWdh       
AMsySZa--1qZghdxj4gypoSQRt_F       
AMsySZa--2gL6xRCZFUCOXtpYxNs       
AMsySZa--2gL6xRCZFUCOXtpYxNs       
AMsySZa--2gL6xRCZFUCOXtpYxNs       
AMsySZa--2gL6xRCZFUCOXtpYxNs       
AMsySZa--2gL6xRCZFUCOXtpYxNs       
AMsySZa--2gL6xRCZFUCOXtpYxNs

But need something like this:

> niekonwersyjne[c(57:62,72:77),c(1,11)]
                     User_ID Counter
AMsySZa--1Og4WwseZJKRyABTWdh       1
AMsySZa--1Og4WwseZJKRyABTWdh       2
AMsySZa--1Og4WwseZJKRyABTWdh       3
AMsySZa--1Og4WwseZJKRyABTWdh       4
AMsySZa--1Og4WwseZJKRyABTWdh       5
AMsySZa--1qZghdxj4gypoSQRt_F       1
AMsySZa--2gL6xRCZFUCOXtpYxNs       1
AMsySZa--2gL6xRCZFUCOXtpYxNs       2
AMsySZa--2gL6xRCZFUCOXtpYxNs       3
AMsySZa--2gL6xRCZFUCOXtpYxNs       4
AMsySZa--2gL6xRCZFUCOXtpYxNs       5
AMsySZa--2gL6xRCZFUCOXtpYxNs       6

I can do this with a loop but the data frame has over 20 mil observations so the calculation time is defintely too high. Is there some other way to achieve this result?

The loop that I am using right now looks like this:

niekonwersyjne$Counter<-1

for (i in 2:nrow(niekonwersyjne)) {
  if (niekonwersyjne[i-1,"User_ID"]==niekonwersyjne[i,"User_ID"]) {
    niekonwersyjne[i,"Counter"]<-niekonwersyjne[i-1,"Counter"]+1} else {
       niekonwersyjne[i,"Counter"]<-1
    }
}

Or [Adding a counter column for a set of similar rows in R](http://stackoverflow.com/questions/19848362/adding-a-counter-column-for-a-set-of-similar-rows-in-r) Or [How can I rank observations in-group faster?](http://stackoverflow.com/questions/6162685/how-can-i-rank-observations-in-group-faster) — Ronak Shah, Nov 23 '16 at 08:54
`ave(df$User_ID, df$User_ID, FUN = function(i) seq_along(i))` — Sotos, Nov 23 '16 at 08:54

score 2 · Answer 1 · answered Nov 23 '16 at 09:18

2

Using the dplyr package you can use the following

library(dplyr)
niekonwersyjne %>% group_by(User_ID) %>% mutate(Counter = row_number())

answered Nov 23 '16 at 09:18

PhillipD

1,797
1
13
23

score 2 · Answer 2 · answered Nov 23 '16 at 09:20

2

Another dplyr answer...

df %>% group_by(User_ID) %>% mutate(ct = 1, counter = cumsum(ct))

answered Nov 23 '16 at 09:20

Jacob

3,437
3
18
31

score 1 · Accepted Answer · answered Nov 23 '16 at 09:14

I find the data.table method quite nice:

library( data.table )
setDT( df )[ , counter := seq_len( .N ), by = User_ID ]

This "splits" the data into subsets based on the by parameter (here User_ID) and adds a sequence to each group, the same length as the group itself.

Or with dplyr

library( dplyr )
df <- df %>%
    group_by( User_ID ) %>%
    mutate( counter = seq_len( n() ) )

score 1 · Answer 4 · answered Nov 23 '16 at 09:33

1

We can use table with sequence in base R

df1$Counter <- unname(sequence(table(df1$User_ID)))
df1$Counter
#[1] 1 2 3 4 5 1 1 2 3 4 5 6

answered Nov 23 '16 at 09:33

akrun

874,273
37
540
662

Adding number if next value the same r

4 Answers4