0

Data:

library(data.table)
A <- data.table(id = letters[1:10], amount = rnorm(10)^2)
B2 <- data.table(
  id = c("c", "d", "e", "e"), 
  ord = 1:4, 
  comment = c("big", "slow", "nice", "nooice")
)

I'm trying to left-join by reference using data.table following this solution:

A[B2, on = .(id), names(B2)[2:3] := mget(paste0("i.", names(B2)[2:3]))]

Which results in the following output:

id amount     ord comment
 a  0.10210291 NA  NA     
 b  0.01255382 NA  NA     
 c  0.83172798  1  big    
 d  0.18312460  2  slow   
 e  0.98596235  4  nooice 
 f  0.78437310 NA  NA     
 g  6.34467810 NA  NA     
 h  1.12852702 NA  NA     
 i  0.23695322 NA  NA     
 j  0.48943532 NA  NA

There is a duplicate "e" in the B2 data.table so I expected an extra row in the final output which I do get when I use left_join from dplyr (ignore the difference in the random numbers in the "amount" column):

left_join(A, B2, by = "id")

id amount    ord comment
a  0.4778922 NA  NA     
b  1.4659516 NA  NA     
c  0.7857094  1  big    
d  0.6697439  2  slow   
e  0.2903246  3  nice <-
e  0.2903246  4  nooice 
f  6.8514519 NA  NA     
g  1.7866884 NA  NA     
h  0.9687253 NA  NA     
i  0.7872538 NA  NA     
j  2.0517777 NA  NA

How do I produce the same output via data.table by reference?

Frank
  • 66,179
  • 8
  • 96
  • 180
Yandle
  • 287
  • 1
  • 10
  • 2
    You cannot do it by reference since that means altering A in-place to change how many rows it has. Related to: https://stackoverflow.com/questions/10790204/how-to-delete-a-row-by-reference-in-data-table If you just want the left-joined table, there's `B2[A, on=.(id)]` (which creates a new table instead of changing either table by reference). Also related: https://stackoverflow.com/a/54313203 – Frank Jun 25 '19 at 18:40
  • Is there any advantages to this syntax over merge(A,B2, all.x = TRUE), as merge gives me the columns in the order that I prefer (A first, then B2)? – Yandle Jun 25 '19 at 19:03
  • I don't think so unless you like to use the `[]` syntax maybe for consistency with other parts of your code that use the same syntax; or need the arguments (mult=, nomatch=, see `?data.table`). I only end up doing this sort of join interactively / at the R prompt when exploring data, so am not sensitive to other tradeoffs that others might recognize between the two options. – Frank Jun 25 '19 at 19:09

0 Answers0