I have a data table, with a grouping column, and would like to pad out the rows so that every group has the same amount of rows as the most populous. Consider this data where column a has the variable and column b is the group;
## dummy data
require(data.table)
dt <- data.table(a=c(1,3,5,1,2,3,4,5,1,5),
b=c("gpA","gpA","gpA","gpB","gpB","gpB","gpB","gpB","gpC","gpC"),
c=c(2346,2314,345,234,578,36,234,547,3425,234))
a b c
1: 1 gpA 2346
2: 3 gpA 2314
3: 5 gpA 345
4: 1 gpB 234
5: 2 gpB 578
6: 3 gpB 36
7: 4 gpB 234
8: 5 gpB 547
9: 1 gpC 3425
10: 5 gpC 234
Group B has the most rows, with column 'a' running through 1 to 5. I would like to 'pad out' the remaining groups to have the same values of column 'a' but column 'c' can be NA or 0.
I have done this with CJ
, following this great answer, but after the initial execution the rows with NA in column c are deleted. Why is this?
# setkey and pad out rows with CJ
setkey(dt, b, a)
> dt[CJ(b,a, unique=T)]
a b c
1: 1 gpA 2346
2: 2 gpA NA
3: 3 gpA 2314
4: 4 gpA NA
5: 5 gpA 345
6: 1 gpB 234
7: 2 gpB 578
8: 3 gpB 36
9: 4 gpB 234
10: 5 gpB 547
11: 1 gpC 3425
12: 2 gpC NA
13: 3 gpC NA
14: 4 gpC NA
15: 5 gpC 234
> dt
a b c
1: 1 gpA 2346
2: 3 gpA 2314
3: 5 gpA 345
4: 1 gpB 234
5: 2 gpB 578
6: 3 gpB 36
7: 4 gpB 234
8: 5 gpB 547
9: 1 gpC 3425
10: 5 gpC 234
>