1

I am relying on https://dplyr.tidyverse.org/articles/programming.html to create interactive tidy functions. These rely on environment variables (as the article calls them) such as the following example.

var_summary <- function(data, var) {
  data %>%
    summarise(n = n(), min = min({{ var }}), max = max({{ var }}))
}
mtcars %>% 
  group_by(cyl) %>% 
  var_summary(mpg)
#> `summarise()` ungrouping output (override with `.groups` argument)

However, when I try a similar method using left_join() I receive an error. Reproducible example below.

# A table
foobar <- tribble(~fooname, ~value, "setosa", 20, "versicolor", 30, "virginica", 10)

# A function
foobarjoin <- function(table, joincol){iris %>% left_join(table, by = c("Species" = {{ joincol }}))}

# When I use the function
 foobarjoin(table = foobar, joincol = fooname)

#> Error in standardise_join_by(by, x_names = x_names, y_names = y_names) : 
#>  object 'fooname' not found 

What is the correct way to use environment variables to perform a dplyr join within a custom function?


Note that this is not a duplicate question of How to join (merge) data frames (inner, outer, left, right). That question is a vanilla join question. This one is specifically about how to use environment variables within functions to achieve a join.

RDavey
  • 1,530
  • 10
  • 27
  • Note that this is not a duplicate question of https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right. That question is a vanilla join question. This one is specifically about how to use environment variables within functions to achieve a join. – RDavey Sep 10 '20 at 11:57

2 Answers2

1

left_join or joins in general need character values. So change your function to :

library(dplyr)
foobarjoin <- function(table, joincol) {
       iris %>% left_join(table, by = c("Species" = joincol))
}

and you can call it as :

foobarjoin(table = foobar, joincol = "fooname")

#    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species value
#1            5.1         3.5          1.4         0.2     setosa    20
#2            4.9         3.0          1.4         0.2     setosa    20
#3            4.7         3.2          1.3         0.2     setosa    20
#4            4.6         3.1          1.5         0.2     setosa    20
#5            5.0         3.6          1.4         0.2     setosa    20
#6            5.4         3.9          1.7         0.4     setosa    20
#7            4.6         3.4          1.4         0.3     setosa    20
#8            5.0         3.4          1.5         0.2     setosa    20
#...
#...
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

Weirdest thing for me... CmdLvl is a character variable in my global environment. For some reason, I can't get this to work:

Output <- my_df1 %>% 
    left_join(my_df2 , by = c(CmdLvl = "RSID"))

But this DOES work:

 Output <- my_df2 %>% 
    right_join(my_df1 , by = c("RSID" = CmdLvl))

I've tried many variations of !!CmdLvl, !! sym(CmdLvl), Paste0(CmdLvl), etc... Nothing works. Another way I've found around this was to copy the column and join on it, instead, like this:

 Output <- my_df1 %>% 
    mutate(linker = !! sym(CmdLvl)) %>% #couldn't get CmdLvl to work in leftjoin below, so I replicate the CmdLvl (Unit) column and use that instead
    left_join(my_df2 , by = c("linker"  = "RSID"))

Then, I can just delete the 'linker' column. I think right_join is the best solution until I can figure out why left_join won't allow the use of a variable from the global environment as the left-side join column reference.