1

Say I have the following data frame:

$Name     $Question
Bob       1
Bob       2  ---> Same Bob as above
Amy       1
Amy       2
Bob       1  ---> A different Bob than above, but shares the same name
Bob       2

So in short, names can occur multiple times, but only consecutive name values (up to the max number of questions) should be associated with the same unique identifier (ID). For instance, I'd like to create this column:

$Name     $Question    $ID
Bob       1            1
Bob       2            1
Amy       1            2
Amy       2            2
Bob       1            3
Bob       2            3

Question will always have the same sequence. I.e. unique person will have Questions 1 and 2.

The jank way I can think of doing this is something like

d$ID = rep(seq(1, number_unique_people), max_question_number)

Grouping in dplyr and then using nrow does not work because all the Bob values will be grouped together.

Any ideas?

pomegranate
  • 755
  • 5
  • 19

1 Answers1

2

As it turns out, this is trivially easy.

library(data.table)
d$ID = rleid(d$Name)

Thanks Rich Scriven for his comment above!

pomegranate
  • 755
  • 5
  • 19