1

There are lots of question and answer on this topic.. I tried some of them which include count(), length(), unique(), unlist(), ... It seems simple but i couldn't find a solution..

Let me explain it on an example, I have a datatable :

DT =    data.table(x=c("a","a","a","a","a","a","a","a","a","b","b","b","b","b"),  y=c(1,1,1,1,2,4,4,4,6,6,6,6,6,7), z=c(1,1,2,3,4,5,6,6,7,8,8,8,9,10))
> DT
       x     y     z
 1:    a     1     1
 2:    a     1     1
 3:    a     1     2
 4:    a     1     3
 5:    a     2     4
 6:    a     4     5
 7:    a     4     6
 8:    a     4     6
 9:    a     6     7
10:    b     6     8
11:    b     6     8
12:    b     6     8
13:    b     6     9
14:    b     7    10

I want a new column t :

DT <- DT[,t := length(count(z)),by=list(x,y)]

When x=a and y=1: z has 3 different values

When x=a and y=2: z has 1 different value

When x=b and y=6: z has 2 different values, and so on...

Therefore the column t should be like :

    x y  z t
 1: a 1  1 3
 2: a 1  1 3
 3: a 1  2 3
 4: a 1  3 3
 5: a 2  4 1
 6: a 4  5 2
 7: a 4  6 2
 8: a 4  6 2
 9: a 6  7 1
10: b 6  8 2
11: b 6  8 2
12: b 6  8 2
13: b 6  9 2
14: b 7 10 1
Frank
  • 66,179
  • 8
  • 96
  • 180
noname
  • 51
  • 4

1 Answers1

1

Perhaps,

DT[, t := uniqueN(z), by = list(x, y)][]

is doing what you expect.

    x y  z t
 1: a 1  1 3
 2: a 1  1 3
 3: a 1  2 3
 4: a 1  3 3
 5: a 2  4 1
 6: a 4  5 2
 7: a 4  6 2
 8: a 4  6 2
 9: a 6  7 1
10: b 6  8 2
11: b 6  8 2
12: b 6  8 2
13: b 6  9 2
14: b 7 10 1
Uwe
  • 41,420
  • 11
  • 90
  • 134
  • 1
    Fwiw, I've found uniqueN can be slow with `by=` (and besides, nothing has GForce with := yet), so I often do `DT[, v:= DT[, .N, by=.(x,y,z)][, .N, by=.(x,y)][DT, on=.(x,y), x.N]]` – Frank Jul 25 '18 at 14:12