2

I have these two DTs that I'd like to perform inner join;

DT1:

   x  y v foo
1: A A1 1   1
2: B A1 2   1
3: C A1 3   1
4: D A2 4   1
5: E A2 5   2
6: F A2 6   1
7: G A3 7   2
8: H A3 8   0
9: I A3 9   0

DT2:

   x z
1: A 1
2: B 2
3: C 3
4: D 4
5: E 5
6: F 6

Inner join of these two by DT[DT2, nomatch=0L, on="x"]will be :

   x  y v foo z
1: A A1 1   1 1
3: C A1 3   1 3
4: D A2 4   1 4
5: E A2 5   2 5
6: F A2 6   1 6

What I want to generate is following:

   x  y v z
1: A A1 1 1
2: B A1 2 2
3: C A1 3 3
4: D A2 4 4
5: E A2 5 5
6: F A2 6 6

As you can see, I want to join two DTs selecting out the column(s) from one of the joining DT. I can do this by removing unwanted columns after joining but I am looking for the one way to do this in an one liner. Any help will be appreciated.

akh22
  • 661
  • 4
  • 16

3 Answers3

3

One option is to use the sqldf package and do a database style join:

sql <- "SELECT t2.x, t1.y, t1.v, t2.z
        FROM DT2 t2
        LEFT JOIN DT1 t1
            ON t2.x = t1.x"
result <- sqldf(sql)

One advantage of using sqldf here is that it makes it easy to choose which columns you want in the result, and in which order.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
2

Since your result table has the same rows as DT2, maybe add them there:

cols = c("y", "v")
DT2[, (cols) := DT1[DT2, on="x", ..cols]]

   x z  y v
1: A 1 A1 1
2: B 2 A1 2
3: C 3 A1 3
4: D 4 A2 4
5: E 5 A2 5
6: F 6 A2 6

This springs a warning, but there's a bug report filed about that.

You might also be interested in this Q&A: Perform a semi-join with data.table

Frank
  • 66,179
  • 8
  • 96
  • 180
1

This gets you fairly close. The first step is logical selection of rows of DT1 whose x's are in DT2[["x"]] and then binding the z values:

cbind( DT1[x %in% DT2[,x] ], DT2[ , z] )

   x  y v foo V2
1: A A1 1   1  1
2: B A1 2   1  2
3: C A1 3   1  3
4: D A2 4   1  4
5: E A2 5   2  5
6: F A2 6   1  6

If you quote the j-argument in the second data.table, and leave out the foo it is an exact answer:

 cbind( DT1[x %in% DT2[,x], list(x,y,v)],  DT2[ , "z"] )
#-------------------
   x  y v z
1: A A1 1 1
2: B A1 2 2
3: C A1 3 3
4: D A2 4 4
5: E A2 5 5
6: F A2 6 6

Instead of quoting "z" you could have used DT2[ , list(z)]

IRTFM
  • 258,963
  • 21
  • 364
  • 487