1

I want to write a custom function that wrangles data. The function's input should be:

  1. data.frame object
  2. names of columns in the data that are relevant to what the function is going to do.

I want to program this function such that the argument specifying the column names will be as flexible as possible. Thus, I decided to use dot-dot-dot (...).

However, I don't know how to incorporate tidyselect's selection helpers (i.e., contains, starts_with, etc.)

Example

For the sake of example, let's say that I want to write a wrapper for dplyr::coalesce().

library(rlang)
library(dplyr)

my_coalesce_func <- function(dat, ...) {
  
  # step 1: save `...` into cols (https://tidyeval.tidyverse.org/multiple.html)
  cols <- enquos(...)
  
  # step 2: mutate new column `new_col` that coalesces columns of interest
  mutate(dat, new_col = coalesce(!!!cols))
}

# toy data
my_df <-
  data.frame(
    col_a = c(2, 2, 5, 5, 3),
    col_b = c(NA, 4, 2, 3, 1),
    col_c = c(4, 5, 3, 1, 2),
    col_d = c(1, NA, 4, 2, 4),
    col_e = c(3, 3, 1, 4, 5),
    extra_col = 1:5
  )

# run the function -- works fine when we explicitly provide column names 
my_coalesce_func(dat = my_df, col_a, col_b, col_c, col_d, col_e)
#>   col_a col_b col_c col_d col_e extra_col new_col
#> 1     2    NA     4     1     3         1       2
#> 2     2     4     5    NA     3         2       2
#> 3     5     2     3     4     1         3       5
#> 4     5     3     1     2     4         4       5
#> 5     3     1     2     4     5         5       3


# run the function -- fails to use a select helper
my_coalesce_func(dat = my_df, starts_with("col"))
#> Error: Problem with `mutate()` column `new_col`.
#> i `new_col = coalesce(starts_with("col"))`.
#> x `starts_with()` must be used within a *selecting* function.
#> i See <https://tidyselect.r-lib.org/reference/faq-selection-context.html>.

Created on 2021-07-01 by the reprex package (v2.0.0)

What do I need to add to my_coalesce_func() for it to be able to run successfully

my_coalesce_func(dat = my_df, starts_with("col"))

Or any other select helper passed to ....

Thanks!

Emman
  • 3,695
  • 2
  • 20
  • 44
  • 1
    Related:https://stackoverflow.com/questions/50088528/use-select-helpers-with-dplyrcoalesce. I think the main problem is it's difficult to create such a statement even without trying to use a function. Combining `coalesce()` and `starts_with()` seem to be messy in general. – MrFlick Jul 01 '21 at 15:13
  • 2
    Another good alternative here: https://stackoverflow.com/questions/64972688/pass-column-names-to-dplyrcoalesce-when-writing-a-custom-function – MrFlick Jul 01 '21 at 15:14
  • I believe this is a duplicate of either or both of the linked questions. – Ian Campbell Jul 01 '21 at 15:21
  • Yes, it is a duplicate. Especially of the second link @MrFlick referred too. – Emman Jul 01 '21 at 15:57

1 Answers1

1

Based on this answer, a trick is to rely on dplyr::select():

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

my_coalesce_func = function(data, ...) {
  data %>%
    select(...) %>%
    transmute(new_col = coalesce(!!!.)) %>% 
    bind_cols(data, .)
}

my_df <-
  data.frame(
    col_a = c(2, 2, 5, 5, 3),
    col_b = c(NA, 4, 2, 3, 1),
    col_c = c(4, 5, 3, 1, 2),
    col_d = c(1, NA, 4, 2, 4),
    col_e = c(3, 3, 1, 4, 5),
    extra_col = 1:5
  )

# explicitly provide column names
my_coalesce_func(dat = my_df, col_d, col_e)
#>   col_a col_b col_c col_d col_e extra_col new_col
#> 1     2    NA     4     1     3         1       1
#> 2     2     4     5    NA     3         2       3
#> 3     5     2     3     4     1         3       4
#> 4     5     3     1     2     4         4       2
#> 5     3     1     2     4     5         5       4

# use tidyselect helpers
my_coalesce_func(dat = my_df, starts_with("col"), -(col_a:col_c))
#>   col_a col_b col_c col_d col_e extra_col new_col
#> 1     2    NA     4     1     3         1       1
#> 2     2     4     5    NA     3         2       3
#> 3     5     2     3     4     1         3       4
#> 4     5     3     1     2     4         4       2
#> 5     3     1     2     4     5         5       4

Created on 2021-07-01 by the reprex package (v1.0.0)

the-mad-statter
  • 5,650
  • 1
  • 10
  • 20
  • 1
    You shouldn't directly copy/paste answers from other questions. If the question is a duplicate, you can mark it as such. – MrFlick Jul 01 '21 at 17:07
  • While I agree the questions were quite similar, the answer/code needed updating due to deprecation. – the-mad-statter Jul 01 '21 at 17:31
  • The it would be better to add the improved answer to the other question. No need to have two posts with different answers about the same problem. – MrFlick Jul 01 '21 at 17:35