count number of unique elements in a datatable with by=list()

Question

There are lots of question and answer on this topic.. I tried some of them which include count(), length(), unique(), unlist(), ... It seems simple but i couldn't find a solution..

Let me explain it on an example, I have a datatable :

DT =    data.table(x=c("a","a","a","a","a","a","a","a","a","b","b","b","b","b"),  y=c(1,1,1,1,2,4,4,4,6,6,6,6,6,7), z=c(1,1,2,3,4,5,6,6,7,8,8,8,9,10))
> DT
       x     y     z
 1:    a     1     1
 2:    a     1     1
 3:    a     1     2
 4:    a     1     3
 5:    a     2     4
 6:    a     4     5
 7:    a     4     6
 8:    a     4     6
 9:    a     6     7
10:    b     6     8
11:    b     6     8
12:    b     6     8
13:    b     6     9
14:    b     7    10

I want a new column t :

DT <- DT[,t := length(count(z)),by=list(x,y)]

When x=a and y=1: z has 3 different values

When x=a and y=2: z has 1 different value

When x=b and y=6: z has 2 different values, and so on...

Therefore the column t should be like :

    x y  z t
 1: a 1  1 3
 2: a 1  1 3
 3: a 1  2 3
 4: a 1  3 3
 5: a 2  4 1
 6: a 4  5 2
 7: a 4  6 2
 8: a 4  6 2
 9: a 6  7 1
10: b 6  8 2
11: b 6  8 2
12: b 6  8 2
13: b 6  9 2
14: b 7 10 1

`DT[, v := uniqueN(z), by=.(x,y)]` should work. – Frank Jul 25 '18 at 14:04 — Frank, Jul 25 '18 at 14:04
@Frank Thank you, This is simple enough :)) – noname Jul 25 '18 at 14:10 — noname, Jul 25 '18 at 14:10

score 1 · Accepted Answer · answered Jul 25 '18 at 14:06

1

Perhaps,

DT[, t := uniqueN(z), by = list(x, y)][]

is doing what you expect.

    x y  z t
 1: a 1  1 3
 2: a 1  1 3
 3: a 1  2 3
 4: a 1  3 3
 5: a 2  4 1
 6: a 4  5 2
 7: a 4  6 2
 8: a 4  6 2
 9: a 6  7 1
10: b 6  8 2
11: b 6  8 2
12: b 6  8 2
13: b 6  9 2
14: b 7 10 1

answered Jul 25 '18 at 14:06

Uwe

41,420
11
90
134

1

Fwiw, I've found uniqueN can be slow with `by=` (and besides, nothing has GForce with := yet), so I often do `DT[, v:= DT[, .N, by=.(x,y,z)][, .N, by=.(x,y)][DT, on=.(x,y), x.N]]` – Frank Jul 25 '18 at 14:12

count number of unique elements in a datatable with by=list()

1 Answers1