1

Take this data frame for example:

DT <- data.table(A = rep(1:3, each=4), 
                 B = rep(c(NA,1,2,4), each=3), 
                 C = rep(1:2, 6))

I want to append a column that assign index to unique combinations of A and B, but ignore C. I also want another column that count the number of duplicates, that looks like this:

    A  B C Index Count
 1: 1 NA 1     1     3
 2: 1 NA 2     1     3
 3: 1 NA 1     1     3
 4: 1  1 2     2     1
 5: 2  1 1     3     2
 6: 2  1 2     3     2
 7: 2  2 1     4     2
 8: 2  2 2     4     2
 9: 3  2 1     5     1
10: 3  4 2     6     3
11: 3  4 1     6     3
12: 3  4 2     6     3

I don't want to trim the data frame and (preferably)I don't want to reorder the rows. I tried setDT, such as

setDT(DT)[,.(.I, .N), by = names(DT[,1:2])]

But the I column is not the index I want, and Column C is gone. Thanks in advance!

Chuck C
  • 153
  • 2
  • 12
  • check this out: https://stackoverflow.com/questions/17421776/how-to-add-count-of-unique-values-by-group-to-r-data-frame – Gaurav Taneja Sep 22 '17 at 02:07
  • 1
    `DT[, g := rleid(A,B)][, n := .N, by=g][]` I guess there is a dupe somewhere. – Frank Sep 22 '17 at 02:18
  • @Frank, your solution worked for the example I gave in the post, but it only works if the data is aligned in this way, if for example, replace the position between B[2] and B[4], your solution will give a wrong result. To make a reproducible new data frame, please try: DT <- data.table(A = rep(1:3, each=4), B = rep(c(NA,1,2,NA,1,2,1,3,2,1,1,3)), C = rep(1:2, 6)) – Chuck C Sep 22 '17 at 02:35
  • 3
    Ok, if your data is not sorted, there's `DT[, \`:=\`(g = .GRP, n = .N), by=.(A,B)]` I think you'll see similar stuff in the intro vignettes for data.table. – Frank Sep 22 '17 at 02:38
  • @Frank, awesome! Thank you so much! – Chuck C Sep 22 '17 at 02:41
  • 2
    @Frank - you can also do that a little less esoterically with `DT[, c("g","n") := .(.GRP, .N), by=.(A,B)]` – thelatemail Sep 22 '17 at 03:02

0 Answers0