2

I want to create group ID by numbering within groups. The important bit here is numbering within groups not global. In the following example, data should be grouped by 'x' and unique ID number created for each unique 'y' value.

df <- data.frame(x=LETTERS[c(1:2, 1, 1:2, 1, 2)], y=LETTERS[c(1, 1, 2, 1, 1, 3, 3)] )

Input

x y  
A A  
B A  
A B  
A A  
B A  
A C  
B C  

Desired outcome

x y ID  
A A  1  
B A  1  
A B  2  
A A  1  
B A  1  
A C  3  
B C  2  

I'd favour data.table way of doing it, but all solutions are welcome. I played around with data.table's .GRP and .N and seq_len(.N) to no avail. As it seems a simple and fairly common task, I can't believe no-one has asked it here yet, I sure failed to find it at least.

Vallo Varik
  • 137
  • 1
  • 10

3 Answers3

1

With dplyr, you can do:

df %>%
 group_by(x) %>%
 mutate(ID = cumsum(!duplicated(y)))

  x     y        ID
  <fct> <fct> <int>
1 A     A         1
2 B     A         1
3 A     A         1
4 A     B         2
5 B     A         1

Depending on your exact data structure, you may need to arrange the data first:

df %>%
 arrange(x, y) %>%
 group_by(x) %>%
 mutate(ID = cumsum(!duplicated(y)))

And the same with data.table could be:

setDT(df)[, ID := cumsum(!duplicated(y)), by = x]

And if you need to arrange it first:

setorder(setDT(df), x, y)[, ID := cumsum(!duplicated(y)), by = x]
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
1

Here is a solution with data.table:

library("data.table")

df <- data.table(x=LETTERS[c(1:2, 1, 1:2, 1, 2)], y=LETTERS[c(1, 1, 2, 1, 1, 3, 3)] )
df[, ID:=as.numeric(as.factor(y)), x]
df
# > df
#    x y ID
# 1: A A  1
# 2: B A  1
# 3: A B  2
# 4: A A  1
# 5: B A  1
# 6: A C  3
# 7: B C  2
jogo
  • 12,469
  • 11
  • 37
  • 42
  • Thanks for contributing and sorry for misleading. Your solution, indeed, takes my initial MRE and arrives at the desired output. However, this is due to shortcomings of my initial MRE. The real point was to group by 'x' and then label 'y' which your solution does not address (see my current amended MRE). – Vallo Varik Sep 09 '19 at 14:35
0

Here is a base R solution. Note that it messes up the order of your data frame,

do.call(rbind, lapply(split(df, df$x), function(i)cbind(i, ID = match(i$y, unique(i$y)))))

#    x y ID
#A.1 A A  1
#A.3 A A  1
#A.4 A B  2
#B.2 B A  1
#B.5 B A  1
Sotos
  • 51,121
  • 6
  • 32
  • 66
  • 1
    Nice, this also works. The solution by @tmfmnk also rearranges my data frame, which is fine for me, but thanks for pointing it out. – Vallo Varik Sep 09 '19 at 14:36