More complex joins in dplyr

Question

In SQL, you can execute joins by using commands like:

select x.var1, y.var2
from x left join y
on substring(x.var1, 1, 5) = y.var2;

Can you do similar manipulation of variables in dplyr (in R) when doing joins, or do the joins have to be done with only exact matches?

One comment suggested creating intermediate variables, but I wanted to know if there was a way around that.

(edited on statement to make it clearer)

Please provide the reproducible examples and the desired output. — www, May 31 '18 at 03:43
You can first create an additional column in `x` with the substring and then join on that column. — Ralf Stubner, May 31 '18 at 04:34
@mnmn See this similar [post](https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right) with detailed answers. Alternatively, see the dplyr [page](https://dplyr.tidyverse.org/reference/join.html) or else see this nice [cheatsheet](http://stat545.com/bit001_dplyr-cheatsheet.html) — mnm, May 31 '18 at 07:05

score 0 · Answer 1 · answered May 31 '18 at 09:44

0

It's hard to be precise without a reproducible example, but in dplyr, you can get a similar effect to your SQL query with this code, by including your substring operation in a piped mutate before the join. Like so: (dummy data used is nonsense, of course)

x <- data.frame(n = c("00000000000", "111111111"), var1 = 1:2)
y <- data.frame(name= as.character(c("00000", "11111")), var2=3:4)


df <- x %>% 
  mutate(name = substr(n, 1,5)) %>%
  left_join(y, by="name") %>%
  select(var1,var2)

answered May 31 '18 at 09:44

Luke Hayden

692
4
8

This makes sense. Could you also incorporate the mutate within the by statement? Like 'by.x = mutate(x, name = substr(n, 1,5)), by.y = "name" ', or would that not work? – mnmn May 31 '18 at 21:14
I don't think you'd be able to put the mutate there, as the by statement is specifying which column to join with. You could incorporate it into the call to left_join, taking out the pipe, if you wish. Like so: left_join(mutate(x, name = substr(n, 1,5)), y, by="name") %>% select(var1,var2) – Luke Hayden Jun 01 '18 at 07:53

More complex joins in dplyr

1 Answers1