Best tidyverse practice for passing column names as variables in function

Question

I am passing a tibble to a user-defined function where column names are variables. After studying this, this, and this, I came up with the below working function. My goal is to include an equivalent function in an R package. My question, while this function works, is there a more correct best practice within the dplyr/tidyeval/tidyverse world?

library(tidyverse)

dat0 <- tibble( a = seq(as.Date('2022-02-10'), as.Date('2022-03-01'), by = "5 days")
  , b = seq(10,40,10))

myCalc <- function(data, dateIn, numIn, yearOut, numOut) {
  data <- data %>%
    mutate(.
      , {{yearOut}} := lubridate::year(.data[[dateIn]])
      , {{numOut}} := 10 * .data[[numIn]]
      ) %>%
    filter(.
      , .data[[numOut]] > 250
      )
}

dat2 <- myCalc(dat0
  , dateIn  = "a"
  , numIn   = "b"
  , yearOut = "c"
  , numOut  = "d")

dat2

    # A tibble: 2 × 4
  a              b     c     d
  <date>     <dbl> <dbl> <dbl>
1 2022-02-20    30  2022   300
2 2022-02-25    40  2022   400

For starters your function does not return anything. You should add `return(data)` to be explicit or you can do an implicit return by not assigning the output for your pipe chain to `data`. — LMc, Jul 08 '22 at 14:52
@LMc ... though in _this_ case, it will invisibly return the final value of `data` without an explicit `data` or `return(data)` on the final line. I generally agree, though, it's usually better to be explicit in this action, since all-to-often I've later did something else after that calc that resulted in returning something else ... — r2evans, Jul 08 '22 at 14:54
greengrass62, the only step you might take further would be the use of NSE in your arguments, not requiring quotes, as in `myCalc(dat0, a, b, c, d)`. I think this adds a bit of complexity, though, that can be a little more difficult to maintain/troubleshoot. I think your function is fine as-is. — r2evans, Jul 08 '22 at 14:57
Sure: the *Metaprogramming* set of articles under https://rlang.r-lib.org/ — r2evans, Jul 08 '22 at 15:08

score 1 · Accepted Answer · answered Jul 08 '22 at 15:06

Since you are already using the curly-curly {{ operator you can implement that further in your function to have quoted arguments:

myCalc <- function(data, dateIn, numIn, yearOut, numOut) {
  data <- data %>%
    mutate(.
           , {{yearOut}} := lubridate::year({{ dateIn }})
           , {{numOut}} := 10 * {{ numIn }}
    ) %>%
    filter(.
           , {{ numOut }} > 250
    )
  
  return(data)
}

Your use of strings does work (e.g. .data[[dateIn]], evaluates to .data[["a"]] in your example). As mentioned in the comments by @r2evans the difference really comes during the function call.

This function would be called like so (note the lack of quotes in the arguments):

dat2 <- myCalc(dat0, 
               dateIn  = a,
               numIn   = b,
               yearOut = c,
               numOut  = d)

You can read more about this with ?rlang::`nse-defuse` and ?rlang::`nse-force` . There is also this tidyverse article with more on the subject.

@greengrass62 here is a link to suggested [tidy syntax](https://style.tidyverse.org/syntax.html). — LMc, Jul 08 '22 at 15:40

Best tidyverse practice for passing column names as variables in function

1 Answers1