indexing sub groups in a dataframe

Question

I am looking for a smart way to index subcategories within a dataframe.
I've created a very simple reproducible example below. How would you code the following step to go from input to output (ie how can we code the creation of color_id variable)?

Thank you very much in advance for your view on this!

input <- data.frame(label = c("red", "red", "blue", "green", "green", "green", "orange"), count = c(2, 2, 1, 3, 3 ,3, 1))

enter image description here

output <- data.frame(label = c("red", "red", "blue", "green", "green", "green", "orange"), count = c(2, 2, 1, 3, 3 ,3, 1), color_id = c(1, 2, 1, 1, 2, 3, 1))

enter image description here

Best regards

I can't currently find a good dupe for this. In base R you can use `?ave`, for example: `within(input, color_id <- ave(seq_along(label), label, FUN = seq_along))` but there are many other ways of doing this. In dplyr: `input %>% group_by(label) %>% mutate(color_id = row_number())` — talat, Jun 19 '15 at 09:19
@DavidArenburg This is a special case of the one I used, but the answer on you linked does directly answer the question. How can I switch the dupe? — James, Jun 19 '15 at 09:35
I think `splitstackshape` has a `getanid` function for this. — Pierre L, Jun 19 '15 at 09:40

score 3 · Accepted Answer · answered Jun 19 '15 at 09:19

using data.table:

library(data.table)
setDT(input)[ , color_id := seq_len(.N), by = label]
    label count color_id
1:    red     2        1
2:    red     2        2
3:   blue     1        1
4:  green     3        1
5:  green     3        2
6:  green     3        3
7: orange     1        1

score 0 · Answer 2 · answered Jun 19 '15 at 09:49

0

library(splitstackshape)
getanID(input, 'label')

answered Jun 19 '15 at 09:49

Pierre L

28,203
6
47
69

Its in the dupe link provided by the package author himself. No need to keep feeding the help vampire. – David Arenburg Jun 19 '15 at 09:54

indexing sub groups in a dataframe

2 Answers2