Problem:
What is the data.table
equivalent of tidyr
's complete
command with group by
?
What is the relationship between on
and by
for data.table
?
Example:
dt=data.table(a = c(1,1,2,2,3,3,4,4) , b = c(4,5,6,7,8,9,10,11) , c = c("x","x","x","x","y","y","y","y"))
show(dt)
a b c
1: 1 4 x
2: 1 5 x
3: 2 6 x
4: 2 7 x
5: 3 8 y
6: 3 9 y
7: 4 10 y
8: 4 11 y
The goal is to obtain the following:
a b c
1 4 x
1 5 x
1 6 x
1 7 x
2 4 x
2 5 x
2 6 x
2 7 x
3 8 y
3 9 y
3 10 y
3 11 y
4 8 y
4 9 y
4 10 y
4 11 y
so something approximately like this:
setDT(dt)[CJ(a=a,b=b,unique=TRUE), on=.(a,b) , by = .(c)]
but it doesn't work and the data.table
documentation is thin on this aspect of the syntax.
Insufficient solutions:
The following SO posts address similar problems, but do not provide sufficient solutions in this context.
- data.table equivalent of complete/fill from tidyr (no group
by
) - data.table equivalent of tidyr::complete() (no group
by
) - data.table equivalent of tidyr::complete with group_by (problem-specific, does not actually work with
by
command)