2

I have a data table, with a grouping column, and would like to pad out the rows so that every group has the same amount of rows as the most populous. Consider this data where column a has the variable and column b is the group;

## dummy data
require(data.table)
dt <- data.table(a=c(1,3,5,1,2,3,4,5,1,5),
b=c("gpA","gpA","gpA","gpB","gpB","gpB","gpB","gpB","gpC","gpC"),
c=c(2346,2314,345,234,578,36,234,547,3425,234))

   a   b    c
1: 1 gpA 2346
2: 3 gpA 2314
3: 5 gpA  345
4: 1 gpB  234
5: 2 gpB  578
6: 3 gpB   36
7: 4 gpB  234
8: 5 gpB  547
9: 1 gpC 3425
10: 5 gpC  234

Group B has the most rows, with column 'a' running through 1 to 5. I would like to 'pad out' the remaining groups to have the same values of column 'a' but column 'c' can be NA or 0.

I have done this with CJ, following this great answer, but after the initial execution the rows with NA in column c are deleted. Why is this?

# setkey and pad out rows with CJ
setkey(dt, b, a)

> dt[CJ(b,a, unique=T)]
    a   b    c
 1: 1 gpA 2346
 2: 2 gpA   NA
 3: 3 gpA 2314
 4: 4 gpA   NA
 5: 5 gpA  345
 6: 1 gpB  234
 7: 2 gpB  578
 8: 3 gpB   36
 9: 4 gpB  234
10: 5 gpB  547
11: 1 gpC 3425
12: 2 gpC   NA
13: 3 gpC   NA
14: 4 gpC   NA
15: 5 gpC  234
> dt
    a   b    c
 1: 1 gpA 2346
 2: 3 gpA 2314
 3: 5 gpA  345
 4: 1 gpB  234
 5: 2 gpB  578
 6: 3 gpB   36
 7: 4 gpB  234
 8: 5 gpB  547
 9: 1 gpC 3425
10: 5 gpC  234
> 
Sam
  • 1,400
  • 13
  • 29

1 Answers1

1

You have to reassign the function to your original variable to save the results.

dt <- dt[CJ(b,a, unique=T)]
  • thanks, I think i was a bit intent on not using subassignment with <-, bit if it works it works! – Sam Apr 06 '18 at 14:38
  • if you don't want to have to keep reassigning your variables over and over again, might I suggest taking a look at the dplyr package. – Roberto Moratore Apr 06 '18 at 14:45
  • sure thanks, i think i did see a good answer for this in dplyr. cheers though. – Sam Apr 06 '18 at 14:49