1

I am trying to find the number of unique rows in a data.table, for each unique element in "A". Here's what I did:

DT <- data.table(A = rep(1:3, each=4), B = rep(1:4, each=3), C = rep(1:2, 6), key = "A")

unique(DT,by=names(DT)) #Gives me each unique row in DT
#    A B C
# 1: 1 1 1
# 2: 1 1 2
# 3: 1 2 2
# 4: 2 2 1
# 5: 2 2 2
# 6: 2 3 1
# 7: 2 3 2
# 8: 3 3 1
# 9: 3 4 2
#10: 3 4 1
nrow(unique(DT,by=names(DT))) #Gives me the number of unique rows in DT
# [1] 10

DT[,nrow(unique(DT,by=names(DT))),by=A] #Doesn't give me the number of unique rows for each unique DT$A.
#   A V1
# 1: 1 10
# 2: 2 10
# 3: 3 10

Can anyone see what am I doing wrong here?

Wet Feet
  • 4,435
  • 10
  • 28
  • 41

2 Answers2

3

I think you want to use .SD (the sub table for each group)

DT[,nrow(unique(.SD)),by=A]

#   A V1
#1: 1  3
#2: 2  4
#3: 3  3
Troy
  • 8,581
  • 29
  • 32
  • hi @WetFeet. I get the same result for `unique(DT)` and `unique(DT,by=names(DT))`? You don't need to specify `by=` for .SD, because it's the result of grouping DT by A. Not sure if that helps! What difference are you seeing between `unique(DT)` and `unique(DT,by=names(DT))`? – Troy Jan 22 '14 at 09:55
  • ,actually I meant `DT[,nrow(unique(.SD)),by=A]` and `DT[,nrow(unique(.SD,by=names(DT)),by=A]`. But after reading the documentation for unique.data.table, I understood the difference. :) – Wet Feet Jan 23 '14 at 02:47
2

because nrow(unique(DT,by=names(DT)) is 10 you are basically saying DT[,10,by=A]

JeremyS
  • 3,497
  • 1
  • 17
  • 19