Can someone explain to in layman terms what is the difference between these two approachs besides order
A <- data.table(id = letters[1:10], amount = 1:10)
B <- data.table(id = c("c", "d", "e"), comment = c("big", "slow", "nice"))
A <- B[A, on = .(id), mult = 'first']
format(object.size(A),units='b')
A
A <- data.table(id = letters[1:10], amount = 1:10)
B <- data.table(id = c("c", "d", "e"), comment = c("big", "slow", "nice"))
A[, comment := B[A, on=.(id), x.comment, mult = 'first']]
A
format(object.size(A), units = 'b')
I use set
functions quite often in data.table
to update and modify data, but I couldn't understand what is the real advantage of doing it in a join. What happens internally when I join and assign to the same object? Is it the same of modify in place the original data.table
or is it making some copy?
I already read this topic "update by reference" vs shallow copy and data.table
vignettes but I'm still not understanding it.
Edit: I don't know if this is the way to track time of it, but looks like the second approach is a lot more faster than the first one with 10^6
repplications of table A
First approach
Unit: milliseconds
expr min lq mean median uq max neval
A <- B[A, on = .(id), mult = "first"] 856.9123 5120.108 13495.41 9702.625 18861.52 70319.84 100
Second approach
Unit: milliseconds
expr min lq mean median
A[, `:=`(comment, B[A, on = .(id), x.comment, mult = "first"])] 471.6508 612.1226 627.4387 625.0439
uq max neval
641.7865 753.1218 100
If the above is right there is a huge advantage of using the second method. Is it just because the first method is making a copy after the join? How R manages this copies since I'm assigning to the same object?