Number of unique rows for each unique element in column A

Question

I am trying to find the number of unique rows in a data.table, for each unique element in "A". Here's what I did:

DT <- data.table(A = rep(1:3, each=4), B = rep(1:4, each=3), C = rep(1:2, 6), key = "A")

unique(DT,by=names(DT)) #Gives me each unique row in DT
#    A B C
# 1: 1 1 1
# 2: 1 1 2
# 3: 1 2 2
# 4: 2 2 1
# 5: 2 2 2
# 6: 2 3 1
# 7: 2 3 2
# 8: 3 3 1
# 9: 3 4 2
#10: 3 4 1
nrow(unique(DT,by=names(DT))) #Gives me the number of unique rows in DT
# [1] 10

DT[,nrow(unique(DT,by=names(DT))),by=A] #Doesn't give me the number of unique rows for each unique DT$A.
#   A V1
# 1: 1 10
# 2: 2 10
# 3: 3 10

Can anyone see what am I doing wrong here?

score 3 · Accepted Answer · answered Jan 22 '14 at 09:38

3

I think you want to use .SD (the sub table for each group)

DT[,nrow(unique(.SD)),by=A]

#   A V1
#1: 1  3
#2: 2  4
#3: 3  3

answered Jan 22 '14 at 09:38

Troy

8,581
29
32

hi @WetFeet. I get the same result for `unique(DT)` and `unique(DT,by=names(DT))`? You don't need to specify `by=` for .SD, because it's the result of grouping DT by A. Not sure if that helps! What difference are you seeing between `unique(DT)` and `unique(DT,by=names(DT))`? – Troy Jan 22 '14 at 09:55
,actually I meant `DT[,nrow(unique(.SD)),by=A]` and `DT[,nrow(unique(.SD,by=names(DT)),by=A]`. But after reading the documentation for unique.data.table, I understood the difference. :) – Wet Feet Jan 23 '14 at 02:47

score 2 · Answer 2 · answered Jan 22 '14 at 09:30

2

because nrow(unique(DT,by=names(DT)) is 10 you are basically saying DT[,10,by=A]

answered Jan 22 '14 at 09:30

JeremyS

3,497
1
17
19

Number of unique rows for each unique element in column A

2 Answers2