0

I am looking for a smart way to index subcategories within a dataframe.
I've created a very simple reproducible example below. How would you code the following step to go from input to output (ie how can we code the creation of color_id variable)?

Thank you very much in advance for your view on this!

input <- data.frame(label = c("red", "red", "blue", "green", "green", "green", "orange"), count = c(2, 2, 1, 3, 3 ,3, 1))

enter image description here

output <- data.frame(label = c("red", "red", "blue", "green", "green", "green", "orange"), count = c(2, 2, 1, 3, 3 ,3, 1), color_id = c(1, 2, 1, 1, 2, 3, 1))

enter image description here

Best regards

cho7tom
  • 1,030
  • 2
  • 13
  • 30
  • 1
    I can't currently find a good dupe for this. In base R you can use `?ave`, for example: `within(input, color_id <- ave(seq_along(label), label, FUN = seq_along))` but there are many other ways of doing this. In dplyr: `input %>% group_by(label) %>% mutate(color_id = row_number())` – talat Jun 19 '15 at 09:19
  • @DavidArenburg This is a special case of the one I used, but the answer on you linked does directly answer the question. How can I switch the dupe? – James Jun 19 '15 at 09:35
  • I think `splitstackshape` has a `getanid` function for this. – Pierre L Jun 19 '15 at 09:40

2 Answers2

3

using data.table:

library(data.table)
setDT(input)[ , color_id := seq_len(.N), by = label]
    label count color_id
1:    red     2        1
2:    red     2        2
3:   blue     1        1
4:  green     3        1
5:  green     3        2
6:  green     3        3
7: orange     1        1
grrgrrbla
  • 2,529
  • 2
  • 16
  • 29
0
library(splitstackshape)
getanID(input, 'label')
Pierre L
  • 28,203
  • 6
  • 47
  • 69