0

I have a dataset and need to create a new variable that will populate the row-wise occurrence of value per the variable in the sorted dataset as shown below.

VAR1    VAR2 (to be created)
C1      1
C1      2
C1      3
C2      1
C3      1
C3      2
C4      1
C5      1

Thanks for the help in advance.

Frank
  • 66,179
  • 8
  • 96
  • 180
Vibhor Kalra
  • 71
  • 1
  • 7

2 Answers2

1
unlist(sapply(rle(as.character(df$VAR1))$lengths,seq))
#[1] 1 2 3 1 1 2 1 1
Sotos
  • 51,121
  • 6
  • 32
  • 66
0

Here is another option using rle and sequence

with(df1, sequence(rle(VAR1)$lengths))
#[1] 1 2 3 1 1 2 1 1

Or with ave

with(df1, ave(seq_along(VAR1), 
        cumsum(c(TRUE, VAR1[-1]!= VAR1[-length(VAR1)])), FUN = seq_along))
#[1] 1 2 3 1 1 2 1 1

Or using rleid from data.table

library(data.table)
setDT(df1)[, VAR2 := seq_len(.N) , by = rleid(VAR1)]
akrun
  • 874,273
  • 37
  • 540
  • 662