I am trying to perform a cartesian join of two data tables. Because my tables are large (10s of Gbs), I need to do this in a memory-efficient and way. So I thought I would update one of the tables by reference in a data.table
-centric way.
Here's what I am trying to achieve:
# fake data
A = data.table(id = c(1, 1, 1, 2, 2, 3),
X = 0:5,
Y = letters[1:6],
key = 'id')
B = data.table(id = rep(1:3, each = 3) ,
time = rep(1:3, 3),
Z = 8:0,
key = c('id', 'time'))
# the desired join: cartesian join with columns from both tables
B[A, on = 'id', allow.cartesian = T]
However, no tables get updated in this solution. And I want to avoid creating a 3rd table, for memory reasons.
How can I achieve this by updating one of the tables (A
or B
) by reference?
Partially inspired by the answers to this question, here's what I have tried without success:
# table B gets the new columns, but not the new rows :(
B[A, allow.cartesian = T, `:=` (X = i.X, Y = i.Y)]
As a bonus, it would be nice if I didn't have to spell out each and every column in the A
table (as in := (X = i.X, Y = i.Y)
), because I want to use all the columns in A.
Thoughts?