-2

I'm facing the following problem in R. I have a dataframe with values identifing a customer. There is a column with User ID. I need to add another column with a counter what is the occurence number of that particular customer in the data. The dataframe is sorted by User ID. So i have something like that:

> niekonwersyjne[c(57:62,72:77),1]
                     User_ID 
AMsySZa--1Og4WwseZJKRyABTWdh       
AMsySZa--1Og4WwseZJKRyABTWdh       
AMsySZa--1Og4WwseZJKRyABTWdh       
AMsySZa--1Og4WwseZJKRyABTWdh       
AMsySZa--1Og4WwseZJKRyABTWdh       
AMsySZa--1qZghdxj4gypoSQRt_F       
AMsySZa--2gL6xRCZFUCOXtpYxNs       
AMsySZa--2gL6xRCZFUCOXtpYxNs       
AMsySZa--2gL6xRCZFUCOXtpYxNs       
AMsySZa--2gL6xRCZFUCOXtpYxNs       
AMsySZa--2gL6xRCZFUCOXtpYxNs       
AMsySZa--2gL6xRCZFUCOXtpYxNs       

But need something like this:

> niekonwersyjne[c(57:62,72:77),c(1,11)]
                     User_ID Counter
AMsySZa--1Og4WwseZJKRyABTWdh       1
AMsySZa--1Og4WwseZJKRyABTWdh       2
AMsySZa--1Og4WwseZJKRyABTWdh       3
AMsySZa--1Og4WwseZJKRyABTWdh       4
AMsySZa--1Og4WwseZJKRyABTWdh       5
AMsySZa--1qZghdxj4gypoSQRt_F       1
AMsySZa--2gL6xRCZFUCOXtpYxNs       1
AMsySZa--2gL6xRCZFUCOXtpYxNs       2
AMsySZa--2gL6xRCZFUCOXtpYxNs       3
AMsySZa--2gL6xRCZFUCOXtpYxNs       4
AMsySZa--2gL6xRCZFUCOXtpYxNs       5
AMsySZa--2gL6xRCZFUCOXtpYxNs       6

I can do this with a loop but the data frame has over 20 mil observations so the calculation time is defintely too high. Is there some other way to achieve this result?

The loop that I am using right now looks like this:

niekonwersyjne$Counter<-1

for (i in 2:nrow(niekonwersyjne)) {
  if (niekonwersyjne[i-1,"User_ID"]==niekonwersyjne[i,"User_ID"]) {
    niekonwersyjne[i,"Counter"]<-niekonwersyjne[i-1,"Counter"]+1} else {
       niekonwersyjne[i,"Counter"]<-1
    }
}
Kitira
  • 9
  • 1
  • 2
    Or [Adding a counter column for a set of similar rows in R](http://stackoverflow.com/questions/19848362/adding-a-counter-column-for-a-set-of-similar-rows-in-r) Or [How can I rank observations in-group faster?](http://stackoverflow.com/questions/6162685/how-can-i-rank-observations-in-group-faster) – Ronak Shah Nov 23 '16 at 08:54
  • 1
    `ave(df$User_ID, df$User_ID, FUN = function(i) seq_along(i))` – Sotos Nov 23 '16 at 08:54

4 Answers4

2

Using the dplyr package you can use the following

library(dplyr)
niekonwersyjne %>% group_by(User_ID) %>% mutate(Counter = row_number())
PhillipD
  • 1,797
  • 1
  • 13
  • 23
2

Another dplyr answer...

df %>% group_by(User_ID) %>% mutate(ct = 1, counter = cumsum(ct))
Jacob
  • 3,437
  • 3
  • 18
  • 31
1

I find the data.table method quite nice:

library( data.table )
setDT( df )[ , counter := seq_len( .N ), by = User_ID ]

This "splits" the data into subsets based on the by parameter (here User_ID) and adds a sequence to each group, the same length as the group itself.

Or with dplyr

library( dplyr )
df <- df %>%
    group_by( User_ID ) %>%
    mutate( counter = seq_len( n() ) )
rosscova
  • 5,430
  • 1
  • 22
  • 35
1

We can use table with sequence in base R

df1$Counter <- unname(sequence(table(df1$User_ID)))
df1$Counter
#[1] 1 2 3 4 5 1 1 2 3 4 5 6
akrun
  • 874,273
  • 37
  • 540
  • 662