0

In R's inner_join function, the parameter by is documented as "a character vector of variables to join by".

The two sample tables are as follows

tb1 <- data.frame(var1 = c(1,3,8,4,2), "var2"=c(-1.1,3.3,4.2,2.3,-3.2), key=c("a","c","b","c","a"))
tb1

tb2 <- data.frame(key=c("a","b","c"),var3=c("Ada","Byron","Cleopatra"))
tb2

So, I tried both equivalent methods. The two (supposedly equivalent) methods I tried are:

(1)
key= c("a","c","b","c","a")
inner_join(tb1,tb2,by="key")
(2)
inner_join(tb1,tb2,by=c("a","c","b","c","a"))

The (2) method produces an error Error: Join columns must be unique. x Problem at position 4 and 5. Both are character vectors. What did I understand wrongly?

  • 3
    The two are not equivalent: with `by="key"` it is using the *column* named `key`, not the *variable* named `key` external to the frame. `by=` needs column names, not column values. I suggest you read https://stackoverflow.com/q/1299871/3358272 and https://stackoverflow.com/a/6188334/3358272. – r2evans Sep 09 '20 at 16:05
  • 1
    The reason you get the "unique" error is that one of the first things `inner_join` is likely to do is verify the `by=` arguments in at least two ways: (1) do they exist in both frames; and (2) are they unique? (Not in that order, and likely in a `{tidyselect}` way.) If your second example did not have duplicate elements, then instead you would have received a slightly better error: `Error: Join columns must be present in data.`, indicating that the `by` columns must be ... *column names* :-) – r2evans Sep 09 '20 at 16:09

0 Answers0