As of May 2022, we now also have the option of using join_by()
, which, in addition to allowing joining by specific columns, like in Dave's answer, allows a variety of other ways of joining two dataframes.
We can use:
- Equality condition: ==
- Inequality conditions: >=, >, <=, or <
- Rolling helper: closest()
- Overlap helpers: between(), within(), or overlaps()
Examples:
# first we create a dataset similar to Dave's one, but with a few more columns, which make it easy to demonstrate the other joins
library(tidyverse)
set.seed(0)
dfx <- tibble(
id = 1:3,
first_name = c("Alice", "Bob", "Charlie"),
last_name = c("Adams", "Baker", "Chaplin"),
a = rnorm(3),
lb = 0.25,
ub = 0.75)
dfy <- tibble(
id = 1:3,
first_name = c("Alice", "Bob", "Charlie"),
last_name = c("Adams", "Baker", "Chaplin"),
b = rnorm(3),
other_range = 0,
other_range2 = 1)
Equality join (what OP asked for):
left_join(dfx, dfy, join_by(id, first_name, last_name == last_name))
Note: if the names of the columns you want to join by are the same in both dataframes, you don't need to do col == col, you can just use col, as in the first two columns in the above example.
Inequality join:
left_join(dfx, dfy, join_by(a < b)) # join the rows where a < b
Rolling joins:
left_join(dfx, dfy, join_by(closest(a < b))) # similar to above, but only take the closest match
Overlap joins:
left_join(dfx, dfy, join_by(between(a, other_range, other_range2))) # join rows where a is between other_range and other_range2
left_join(dfx, dfy, join_by(overlaps(lb, ub, other_range, other_range2))) # join rows where the the ranges (lb to ub, and other_range to other_range2) overlap
left_join(dfx, dfy, join_by(within(lb, ub, other_range, other_range2))) # join rows where lb to ub is within other_range to other_range2
Another note: join_by()
assumes you'll list columns for the left column before the right one. If you don't want to do that for some reason, use x$
for the left dataframe, and y$
for the right dataframe, e.g. join_by(x$a < y$b)
.
For more information, read the documentation.
Update:
I realised I never actually addressed the nub of OP's question:
Something like this: left_join(x, y, by = c("a c" = "b d")
You can't do that specifically, because dplyr expects every string to be the name of a column.
However, if you have two strings which contain columns separated by spaces, you could do something like this:
j1 <- "id first_name last_name"
j2 <- j1 # let's pretend for the sake of argument they are different, as it doesn't change the answer
join_vec <- function(j1, j2) {
setNames(str_split(j2, " ")[[1]], str_split(j1, " ")[[1]])
}
left_join(dfx, dfy, by = join_vec(j1, j2))