2

I want to merge two data tables on multiple columns using data table.

## The two tables
A <- data.table(a = 1:4, b = 12:15, c = 10:13)
   a  b  c
1: 1 12 10
2: 2 13 11
3: 3 14 12
4: 4 15 13

B <- data.table(a = 1:3, b = 12:14, d = 10:12, e = 10:12)
   a  b  d  e
1: 1 12 10 10
2: 2 13 11 11
3: 3 14 12 12    

I would like to get this result:

A <- merge(A, B, by = c('a', 'b'), all.x = T)
   a  b  c  d  e
1: 1 12 10 10 10
2: 2 13 11 11 11
3: 3 14 12 12 12
4: 4 15 13 NA NA

This answer showed that data table can edit the A data frame in-memory, so it's more efficient when using large data sets. The code suggested is

A[B, bb:=i.b, on='a']

How can I alter this code so the A data frame is merged in-memory, and the result will be the same as above?

  • 1
    See https://stackoverflow.com/questions/41439976/update-a-subset-of-dataframe-rows-and-columns-from-another-dataframe/41440339#41440339 – David Arenburg Aug 14 '17 at 15:11

1 Answers1

2

If we meant to do a join. Here the i.d or i.e is not needed as the columns are unique in the 'B'

nm1 <- setdiff(names(B), names(A)) 
A[B, (nm1) :=  mget(nm1), on = .(a, b)]
A
#   a  b  c  d  e
#1: 1 12 10 10 10
#2: 2 13 11 11 11
#3: 3 14 12 12 12
#4: 4 15 13 NA NA
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks! Because I use two data sets with many more columns, is it possible to make this more flexible ? for example using setdiff(names(B), names(A)) in the j section of data table? – Jelger van Zaane Aug 14 '17 at 15:08