1

I am using data.table to do a one-to-many merge. Instead of matching with all the rows, the output is showing only the last matched row for each unique value of the key.

a <- data.table(x = 1:2L, y = letters[1:4])
b <- data.table(x = c(1L,3L))

setkey(a,x)
setkey(b,x)

I want to do a many to one (b to a) join based on column x.

c <- a[b,on=.(x)]
c
#    x  y
# 1: 1  a
# 2: 1  c
# 3: 3 NA

However, this approach creates a new data.table called c, instead of making a new data.table, I use the following code to add the column y with b.

b[a,y:=i.y]

Now b looks like,

b
#    x  y
# 1: 1  c
# 2: 3 NA

The desired output is the one in the first method (c). Is there a way of using := and output all the rows instead of the last matched row alone?

PS: The reason I want to use method 2 using := is because my data is huge and I do not want to make copies. The example I showed reflects what happens in my data.

  • 1
    There is no way achieving this using `:=` I'm aware of. You are basically want to add new rows to `b` by reference. This has been asked before already, see https://stackoverflow.com/questions/10790204/how-to-delete-a-row-by-reference-in-data-table – David Arenburg May 09 '18 at 18:19
  • Yeah, I think a new table is the right output for this operation. All data.table modifications by reference preserve #rows. You could collapse the values from `a`, eg `b[, y := a[b, toString(x.y), by=.EACHI]$V1]`, though I guess that is unlikely to be useful unless the values are only for browsing / not used in later analysis. – Frank May 09 '18 at 19:24

0 Answers0