1

I have two tables A1 and A2 that I would like to join on all columns (purpose is to have a function that can be used to track changes to the data). Is there a easy way to join on all columns without specifying them explicitly.

Something like A1[A2, on="ALL"]

If anyone needs some code example I can provide it, but the question is more generally about if I have missed something in the help section for data.table.

I'm looking for a data.table solution. If there is none I'll use dplyr's:

inner_join(A1, A2, by = NULL)
ErrantBard
  • 1,421
  • 1
  • 21
  • 40

2 Answers2

2

I think that the way to do this in the data table is the following:

require(data.table)
dt1 <- data.table(A1 = c(1,2,3), A2 = c("A", "B", "D"))
dt2 <- data.table(A1 = c(3,2,3), A2 = c("A", "B", "C"))

setkeyv(dt1, names(dt1))
setkeyv(dt2, names(dt2))

and the inner join on all common columns is:

dt1[dt2, nomatch = 0]

Other options include the following (credits to Frank in the comments):

dt1[dt2, on=names(dt2), nomatch = 0]

This has the benefit of not requiring to key the data table. (More info can be found here: What is the purpose of setting a key in data.table? )


Another option using the data sets operations (available in version 1.9.7 or later):

fintersect(dt1, dt2)
Community
  • 1
  • 1
User2321
  • 2,952
  • 23
  • 46
  • 3
    These days, it's recommended to use `on=` instead of setting keys: http://stackoverflow.com/a/20057411/ so `dt1[dt2, on=names(dt2), nomatch = 0]`. Also, in the current devel version there are set operators for data sets, so `fintersect(dt1, dt2)` is now an option. – Frank Oct 17 '16 at 14:34
  • 1
    Didn't know! Thank you very much I will keep it in mind :) (And add it to the answer if you don't mind) – User2321 Oct 17 '16 at 14:37
  • 1
    Sorry, didn't see the edit to your comment before. The `intersect` from the question you linked is different from the "fast" `fintersect` I was talking about and is only relevant for vectors, not tables of data. – Frank Oct 17 '16 at 15:23
  • 2
    up to date manual for [`fintersect`](http://jangorecki.gitlab.io/data.table/library/data.table/html/setops.html). When joining on all columns be aware that joining on double column type is not a best idea due to floating point rounding differences. – jangorecki Oct 17 '16 at 15:31
  • Thank you both for the changes and the info! – User2321 Oct 17 '16 at 15:42
  • Yeah, thx. I noticed some problems with sort, before I applied the on-argument so that was also an improvment – ErrantBard Oct 18 '16 at 12:41
1

Using the given example, another option is the 'dplyr' package.

library(dplyr)
df1 <- data.frame(A1 = c(1,2,3), A2 = c("A", "B", "D"))
df2 <- data.frame(A1 = c(3,2,3), A2 = c("A", "B", "C"))

inner_join(df1, df2)
sluedtke
  • 314
  • 1
  • 7