1

Assume we have the following simple case taken from the data.table documentation: (via ?data.table)

DT = data.table(x=rep(c("b","a","c"),each=3), y=c(1,3,6), v=1:9)
X = data.table(x=c("c","b"), v=8:7, foo=c(4,2))

I dont understand the labels given in the example of the documentation:

DT[X, on="x"] # right join (1)
X[DT, on="x"] # left join (2)

How can I say the first case (1) is a right join? I mean from my understanding I need to know which table is "right" and which is "left" to begin with before I can create a left outer or a right outer join.

When I execute the commands, (1) results in a join of X on DT, and (2) results in a join of DT on x.

So the correct labeling from my understanding would be: (1) outer merge of X on DT and (2) outer merge of DT on X, but I cannot make any statement on the direction. It seems incorrect that (1) is the reverse of (2), because that is not true. What am I missing?

Hieu Nguyen
  • 492
  • 3
  • 8
ghx12
  • 77
  • 6
  • So if you right join DT and X `DT[X, on="x"]` and if you left join DT and X it is `X[DT, on="x"]` the direction is clear but the order in which you mention the tables is kind of reversed with the data.table syntax. – s_baldur May 25 '23 at 09:59
  • Note sure if I understand your reply. So the issue I have with `DT[X, on="x"]`, which is labelled as a "right join of DT and X" is that it must also be a "left join of X on DT", based on the logic of your second example. But there is a conflict, as a right join and a left join of two tables usually do not lead to the same result. – ghx12 May 25 '23 at 10:04
  • There is no conflict it's a right join AND not a left join. – s_baldur May 25 '23 at 10:17
  • Ok but why? `DT[X, on="x"]` looks to me as a left join on X on DT. Otherwise, how can the second case `X[DT, on="x"]` be a left join of DT on x? – ghx12 May 25 '23 at 10:20
  • Sure. It's a left join if you take "X" as the reference table. Maybe this helps: https://stackoverflow.com/questions/63202116/left-join-vs-right-join – s_baldur May 25 '23 at 10:30
  • Alright, so we need to state that DT is the reference table, then the notation in the documentation is fine? I was missing that point in the documentation. Because the reference table could also be X. – ghx12 May 25 '23 at 11:32

0 Answers0