4

This is related to multiple duplicates (1, 2, 3), but a slightly different problem that I'm stuck with. So far, I've seen pandas solution only.

In this data table:

dt = data.table(gr = rep(letters[1:2], each = 6), 
                cl = rep(letters[1:4], each = 3))

    gr cl
 1:  a  a
 2:  a  a
 3:  a  a
 4:  a  b
 5:  a  b
 6:  a  b
 7:  b  c
 8:  b  c
 9:  b  c
10:  b  d
11:  b  d
12:  b  d

I'd like to enumerate unique classes per group to obtain this:

    gr cl id
 1:  a  a  1
 2:  a  a  1
 3:  a  a  1
 4:  a  b  2
 5:  a  b  2
 6:  a  b  2
 7:  b  c  1
 8:  b  c  1
 9:  b  c  1
10:  b  d  2
11:  b  d  2
12:  b  d  2
mattek
  • 903
  • 1
  • 6
  • 18

3 Answers3

4

Try

library(data.table)
dt[, id := rleid(cl), by=gr]
dt
#    gr cl id
# 1:  a  a  1
# 2:  a  a  1
# 3:  a  a  1
# 4:  a  b  2
# 5:  a  b  2
# 6:  a  b  2
# 7:  b  c  1
# 8:  b  c  1
# 9:  b  c  1
#10:  b  d  2
#11:  b  d  2
#12:  b  d  2
markus
  • 25,843
  • 5
  • 39
  • 58
3

You can do (maybe it will require to sort the data first):

dt[, id := cumsum(!duplicated(cl)), by = gr]

    gr cl id
 1:  a  a  1
 2:  a  a  1
 3:  a  a  1
 4:  a  b  2
 5:  a  b  2
 6:  a  b  2
 7:  b  c  1
 8:  b  c  1
 9:  b  c  1
10:  b  d  2
11:  b  d  2
12:  b  d  2

The same with dplyr:

dt %>%
 group_by(gr) %>%
 mutate(id = cumsum(!duplicated(cl)))

Or a rleid()-like possibility:

dt %>%
 group_by(gr) %>%
 mutate(id = with(rle(cl), rep(seq_along(lengths), lengths)))
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
0

An alternative solution using factor which will not require ordering first

dt %>%
  group_by(gr) %>%
  mutate(id = as.numeric(factor(cl))) %>%
  ungroup()

# # A tibble: 12 x 3
#   gr    cl       id
#   <chr> <chr> <dbl>
# 1 a     a         1
# 2 a     a         1
# 3 a     a         1
# 4 a     b         2
# 5 a     b         2
# 6 a     b         2
# 7 b     c         1
# 8 b     c         1
# 9 b     c         1
#10 b     d         2
#11 b     d         2
#12 b     d         2

Note that this will automatically assign a number / id based on the alphabetical order of the cl values, within each gr group.

AntoniosK
  • 15,991
  • 2
  • 19
  • 32