2

Everything I can find online about tidyeval is either older and not up to date with latest version of tidyverse\dplyr or else doesn't quite apply.

An example tibble is:

df <- tribble(
       ~var1, ~var2, ~var3,
         1,     2,     3,
         4,     5,     6,
         7,     8,     9
        )

I have a small function that I've written:

fun <- function(data, select_var, arrange_var) {
   select_var <- enquo(select_var)
   arrange_var <- enquo(arrange_var)

   data %>%
     select(!!select_var) %>%
     arrange(!!arrange_var)
   }

The function simply selects column(s) and then arranges by row(s).

When I pass the arguments to the function it works fine with a single variable inside of c():

fun(df, 
    c(var1,
      var2)),
    c(var2))

However, when I try to pass it two variables like this:

    fun(df, 
    c(var1,
      var2)),
    c(var1,
      var2))

I get the following error:

Error: incorrect size (282) at position 1, expecting : 141

The closest stack responses I've been able to find are: arrange() doesn't recognize column name parameter and Pass a vector of variable names to arrange() in dplyr

but both of these seem to give answers that included deprecated solutions (e.g., arrange_())

Some great information here: tidyeval resource roundup by Mara Averick

and Separating and Trimming Messy Data the Tidy Way by Paul Oldham

and of course I've dug into: tidyeval

However none of them seem to address this quirk. I've exhausted my resources after spending an afternoon. The code works find in a standard R file, just can't get it to work inside of a function, but about ready to give up, so thought I would see if you wonderful folks could help. Thanks in advance.

  • At least part of this is the different way that `select` and `arrange` behave. Compare just `select(df, c(var1. var2))` and `arrange(df, c(var1. var2))`. `select` gets a pass on vector input for historical reasons that I don't fully understand; it's more the exception than the rule. I am curious if someone figures the way to using splicing to allow vector input rather than dots capture with `arrange` though. – Calum You Mar 08 '19 at 01:17
  • If you want to pass multiple variables you need to use `enquos` instead of `enquo` or `syms` instead of `sym` (if passing column names as strings). Then use the `!!!` instead of the `!!` as this splices the input and evaluates it. – Croote Mar 08 '19 at 02:08
  • It looks like `!!!` is soft deprecated per message just received: `Unquoting language objects with `!!!` is soft-deprecated as of rlang 0.3.0. Please use `!!` instead.` – Scott Davidson Mar 08 '19 at 03:35

1 Answers1

3

Update 2022/03/17

The tidyverse has evolved and so should this answer.

There is no need for enquo anymore! Instead we enclose tidy-select expressions in double braces {{ }}.

library("tidyverse")

df <- tribble(
  ~var1, ~var2, ~var3,
  1, 2, 3,
  4, 5, 6,
  7, 8, 9
)

fun <- function(data, select_vars, ...) {
  data %>%
    select(
      {{ select_vars }}
    ) %>%
    arrange(
      ...
    )
}


fun(df, c(var1, var2), desc(var2))
#> # A tibble: 3 × 2
#>    var1  var2
#>   <dbl> <dbl>
#> 1     7     8
#> 2     4     5
#> 3     1     2
fun(df, c(var1, var2), var1, var2)
#> # A tibble: 3 × 2
#>    var1  var2
#>   <dbl> <dbl>
#> 1     1     2
#> 2     4     5
#> 3     7     8

We still can't use c() with the arrange and filter verbs because that's not allowed with data-masking.

df %>%
  arrange(
    c(var1, var2)
  )
#> Error in `arrange()`:
#> ! Problem with the implicit `transmute()` step.
#> x Problem while computing `..1 = c(var1, var2)`.
#> x `..1` must be size 3 or 1, not 6.

Created on 2022-03-17 by the reprex package (v2.0.1)

Old answer

Replacing arrange_var with ... and specifying the variables without enclosing them in c() makes it work.

library("dplyr")

df <- tribble(
  ~var1, ~var2, ~var3,
  1, 2, 3,
  4, 5, 6,
  7, 8, 9
)

fun <- function(data, select_var, ...) {
  select_var <- enquo(select_var)
  data %>%
    select(!!select_var) %>%
    # You can pass the dots to `arrange` directly
    arrange(...)
}

fun(df, c(var1, var2), var2)
#> # A tibble: 3 x 2
#>    var1  var2
#>   <dbl> <dbl>
#> 1     1     2
#> 2     4     5
#> 3     7     8
fun(df, c(var1, var2), var1, var2)
#> # A tibble: 3 x 2
#>    var1  var2
#>   <dbl> <dbl>
#> 1     1     2
#> 2     4     5
#> 3     7     8

Created on 2019-03-08 by the reprex package (v0.2.1)

It turns out that only select supports strings and character vectors. As the documentation says, "This is unlike other verbs where strings would be ambiguous." See the last example for dplyr::select.

# Two alternatives; both work with `select`.
df %>%
  select(var1, var2)
#> # A tibble: 3 x 2
#>    var1  var2
#>   <dbl> <dbl>
#> 1     1     2
#> 2     4     5
#> 3     7     8
df %>%
  select(c(var1, var2))
#> # A tibble: 3 x 2
#>    var1  var2
#>   <dbl> <dbl>
#> 1     1     2
#> 2     4     5
#> 3     7     8

# `arrange` only works with lists on comma separated unquoted variable names.
df %>%
  arrange(var1, var2)
#> # A tibble: 3 x 3
#>    var1  var2  var3
#>   <dbl> <dbl> <dbl>
#> 1     1     2     3
#> 2     4     5     6
#> 3     7     8     9
df %>%
  arrange(c(var, var2))
#> Error: incorrect size (4) at position 1, expecting : 3

Created on 2019-03-08 by the reprex package (v0.2.1)

dipetkov
  • 3,380
  • 1
  • 11
  • 19
  • Thank you! That's perfect. The whole enquo, enquos distinction is tough to wrap my end around. I appreciate the quick response. – Scott Davidson Mar 08 '19 at 03:47
  • I was interested in why there is difference between how `dplyr` deals with `select` and `arrange` and the documentation spells it out. And for `enquo` vs `enquos`, `enquo` is for a single quosure, the `enquos` is for list of quosures. – dipetkov Mar 08 '19 at 09:47
  • You can also just pass the dots to `arrange()`, no need for `enquos()`. – Lionel Henry Mar 08 '19 at 13:42
  • @lionel, Thanks. You are quite right and this makes the code cleaner. I will update the answer. – dipetkov Mar 08 '19 at 18:06
  • Also deleted the bit about `enquo` vs `enquos` since it is no longer necessary. – dipetkov Mar 08 '19 at 18:12
  • I found that `arrange(across(all_of({{sort_vars}})))` allows passing a vector of variable names, `sort_vars`, to a function. Based on [this answer](https://stackoverflow.com/questions/34487641/dplyr-groupby-on-multiple-columns-using-variable-names). No idea which is more "correct", but wanted to add the finding. – Hendy Aug 31 '22 at 00:52
  • 1
    @Hendy Thank you for pointing out that this post assumes that the variables are unquoted. It's true that quoted vars (such as "var1") need to be handled differently. – dipetkov Aug 31 '22 at 17:52
  • @dipetkov oh yikes. I completely missed that in my hunt for how to pass grouping/sorting variables via functions but... assumed these would have to be passed as characters, not bare variables! This stuff is very new to me and it still hurts my head that you can just pass `var1` even now that I better understand your answer! Like... what is being passed if `var1` isn't assigned? Anyway, thanks for clarifying. – Hendy Aug 31 '22 at 18:17
  • 1
    @Hendy There have been attempts to write code that works for both quoted and unquoted variables: [example](https://stackoverflow.com/questions/61490856/how-can-you-make-tidyverse-functions-that-support-both-quoted-and-unquoted-argum). Personally, I find it awkward and prefer unquoted variables. Unless, the variables are optional input arguments to a program (since arguments are parsed as strings). In any case, it won't be fun if it is easy. – dipetkov Sep 01 '22 at 07:10
  • @dipetkov In my case, `...` is reserved for something else. Is there a way to work with `arrange_vars` and pass it to `dplyr::arrange`? – talegari Feb 20 '23 at 08:03
  • @talegari I'm not sure. With the tidyverse, usually there is a way even if ends up being not very elegant. Have you written a question, with minimally reproducible example, that explicitly states the constraints of your problem setting? – dipetkov Feb 20 '23 at 08:28