2

I am currently accustoming myself with data.table (for the a m a z i n g speed, as well as non-equi-joins).

I find the join-syntax a little counterintuitive, could someone help me out, how to look at left and right joins the "data.table"-way?

Examples from r-datatable.com

require(data.table)
example(data.table)
# joins as subsets
X = data.table(x=c("c","b"), v=8:7, foo=c(4,2))
X

DT[X, on="x"]                               # right join
X[DT, on="x"]                               # left join

Right Join is the default and the new object (X) is right joined?

stats-hb
  • 958
  • 13
  • 31
  • 2
    Same goes for me, I prefer to do joins using `merge`, which in my opinion in most cases is just more intuitive. See also https://rstudio-pubs-static.s3.amazonaws.com/52230_5ae0d25125b544caab32f75f0360e775.html – hannes101 Feb 25 '19 at 10:34
  • 2
    For the left-join part of your question, this is a really good post that you could go through: https://stackoverflow.com/a/54313203/8583393 – markus Feb 25 '19 at 10:34
  • 1
    use "merge" on `data.table` objects. Method dispatching will make sure that you get data.table's speed gain. – abhiieor Feb 25 '19 at 10:50
  • @abhiieor will this also work for non-equi joins? :) – stats-hb Feb 25 '19 at 10:59
  • 2
    When you have a `X[Y]` join it means: "*For every value in `Y` try to join a value from `X`*", hence, basically this is a left join to `Y` and the result will be the length of `Y` (I agree it's kind of counter-intuitive). – David Arenburg Feb 25 '19 at 11:17
  • 5
    I think this post, including the 'summary' in the actual question, is useful: [Why does XY join of data.tables not allow a full outer join, or a left join?](https://stackoverflow.com/questions/12773822/why-does-xy-join-of-data-tables-not-allow-a-full-outer-join-or-a-left-join). jangorecki's data.table answer in the canonical join Q&A of course: [How to join (merge) data frames (inner, outer, left, right)?](https://stackoverflow.com/a/34219998/1851712). And, not the least @Frank's [excellent tutorial](http://franknarf1.github.io/r-tutorial/_book/tables.html#dt-joins) – Henrik Feb 25 '19 at 12:13
  • From the tutorial: "Think of `x[i]` as using index table `i` to look up rows of `x`, in the same way an “index matrix” can look up elements of a matrix [...] By default, we see results for every row of `i`, even those that are unmatched.". Perhaps a more useful, 'intuitive' way to describe the `data.table` joins than the traditional left-right axis. – Henrik Feb 25 '19 at 12:59
  • @stats-hb no, not for non-equi joins. However for equi joins using merge makes your code much readable. with a habit of using data.table non-equi joins with bracket notations `[]` when required with comments on code. This makes sure another person reading your code doesn't get lost while reading your code. `[]` for data.table operation or for non-equi joins – abhiieor Feb 25 '19 at 13:05

1 Answers1

-1

Right Join is the default and the new object (X) is right joined?

The reason for that is consistency to base R way of subset of vectors/matrices. I think there is an entry in FAQ for that. Notice when you use := during join you get left join. There is an issue which discuss consistency of merges with [ to base R, afair #1615.

jangorecki
  • 16,384
  • 4
  • 79
  • 160
  • Please add a link to the FAQ, and an example of how to use `:=` in a join. As your answer stand right now, it doesn't really help explain anything. – Olsgaard Apr 19 '22 at 05:43