0

I have a data-frame called 'batsmen'. It has close to 100k rows.

One column is called 'Inns'. It goes like this {1,1,1,1,2,2,2,1,1,1,1,1,2,2,2,2,0,0,1,1,1,1,1,2,2,2,2,2,2,2...}

I want to define a new column 'Position' in the same data-frame. It will be a conditional integer sequence (seq.int). It will start from 1, and go on till 'Inns' changes value. As soon as 'Inns' changes value, 'Position' will start from 1 again. So in the above example for 'Inns', 'Position' should look like this: {1,2,3,4,1,2,3,1,2,3,4,5,1,2,3,4,1,2,1,2,3,4,5,1,2,3,4,5,6,7....}

I can do this using for loop. But I don't want to lose on run-time because I see this as a small step in the overall program. Can you suggest an easy way without using for loop?

2 Answers2

2

You can use data.table::rleid which creates an id for each chunk of consecutively equal values and use it as a group variable to make the sequence:

x <- c(1,1,1,1,2,2,2,1,1,1,1,1,2,2,2,2,0,0,1,1,1,1,1,2,2,2,2,2,2,2)
ave(x, data.table::rleid(x), FUN = seq_along)
# [1] 1 2 3 4 1 2 3 1 2 3 4 5 1 2 3 4 1 2 1 2 3 4 5 1 2 3 4 5 6 7

Or you can use baseR diff and cumsum to create the group variable:

ave(x, cumsum(c(F, diff(x) != 0)), FUN = seq_along)
# [1] 1 2 3 4 1 2 3 1 2 3 4 5 1 2 3 4 1 2 1 2 3 4 5 1 2 3 4 5 6 7
Psidom
  • 209,562
  • 33
  • 339
  • 356
1

We can use base R rle function and then extract it's length property and use the sequence function to generate the appropriate sequence

sequence(rle(x)$lengths)
#[1] 1 2 3 4 1 2 3 1 2 3 4 5 1 2 3 4 1 2 1 2 3 4 5 1 2 3 4 5 6 7
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213