Group IDs within groups

Question

I want to create group ID by numbering within groups. The important bit here is numbering within groups not global. In the following example, data should be grouped by 'x' and unique ID number created for each unique 'y' value.

df <- data.frame(x=LETTERS[c(1:2, 1, 1:2, 1, 2)], y=LETTERS[c(1, 1, 2, 1, 1, 3, 3)] )

Input

x y  
A A  
B A  
A B  
A A  
B A  
A C  
B C

Desired outcome

x y ID  
A A  1  
B A  1  
A B  2  
A A  1  
B A  1  
A C  3  
B C  2

I'd favour data.table way of doing it, but all solutions are welcome. I played around with data.table's .GRP and .N and seq_len(.N) to no avail. As it seems a simple and fairly common task, I can't believe no-one has asked it here yet, I sure failed to find it at least.

tmfmnk · Answer 1 · 2019-09-09T13:13:06.833

1

With dplyr, you can do:

df %>%
 group_by(x) %>%
 mutate(ID = cumsum(!duplicated(y)))

  x     y        ID
  <fct> <fct> <int>
1 A     A         1
2 B     A         1
3 A     A         1
4 A     B         2
5 B     A         1

Depending on your exact data structure, you may need to arrange the data first:

df %>%
 arrange(x, y) %>%
 group_by(x) %>%
 mutate(ID = cumsum(!duplicated(y)))

And the same with data.table could be:

setDT(df)[, ID := cumsum(!duplicated(y)), by = x]

And if you need to arrange it first:

setorder(setDT(df), x, y)[, ID := cumsum(!duplicated(y)), by = x]

edited Sep 09 '19 at 13:13

answered Sep 09 '19 at 12:16

tmfmnk

38,881
4
47
67

Very nice, I adjusted MRE to include edge cases you @tmfmnk pointed out. – Vallo Varik Sep 09 '19 at 14:29
Well, my initial MRE was an edge case, your solution with arrange is more universal and serves everyone better. – Vallo Varik Sep 09 '19 at 14:48

jogo · Accepted Answer · 2019-09-09T17:22:35.683

1

Here is a solution with data.table:

library("data.table")

df <- data.table(x=LETTERS[c(1:2, 1, 1:2, 1, 2)], y=LETTERS[c(1, 1, 2, 1, 1, 3, 3)] )
df[, ID:=as.numeric(as.factor(y)), x]
df
# > df
#    x y ID
# 1: A A  1
# 2: B A  1
# 3: A B  2
# 4: A A  1
# 5: B A  1
# 6: A C  3
# 7: B C  2

edited Sep 09 '19 at 17:22

answered Sep 09 '19 at 12:21

jogo

12,469
11
37
42

Thanks for contributing and sorry for misleading. Your solution, indeed, takes my initial MRE and arrives at the desired output. However, this is due to shortcomings of my initial MRE. The real point was to group by 'x' and then label 'y' which your solution does not address (see my current amended MRE). – Vallo Varik Sep 09 '19 at 14:35

score 0 · Answer 3 · answered Sep 09 '19 at 12:33

0

Here is a base R solution. Note that it messes up the order of your data frame,

do.call(rbind, lapply(split(df, df$x), function(i)cbind(i, ID = match(i$y, unique(i$y)))))

#    x y ID
#A.1 A A  1
#A.3 A A  1
#A.4 A B  2
#B.2 B A  1
#B.5 B A  1

answered Sep 09 '19 at 12:33

Sotos

51,121
6
32
66

1

Nice, this also works. The solution by @tmfmnk also rearranges my data frame, which is fine for me, but thanks for pointing it out. – Vallo Varik Sep 09 '19 at 14:36

Group IDs within groups

3 Answers3