5

Within R, I use dplyr and more specifically arrange(). Somehow the arrange function doesn't work as expected.

In the example below first I store the name of a column, then I pass this variable as a parameter to a custom function called 'my_function'.

target_column = 'mean_age'

# below the function
my_function <- function(target_column, number){
    df <- read.csv('file.csv', stringsAsFactors=FALSE)
    df <- df[, c(1,4,10)]
    names(df) <-  c('place','state','mean_age')
    df1 <- df %>% group_by(state) %>% arrange(target_column) 
    df1 %>% summarise(rank = nth(target_column, number))        
}

R returns an error when 'my_function' is called due to the input to arrange():

"Error in arrange_impl(.data, dots) : incorrect size (1) at position 1, expecting : 4000"

When the name of the column is put directly into arrange(), instead of a variable that references to a string (like example above), it does accept the parameter.

df %>% group_by(state) %>% arrange(mean_age) 

How can I pass the parameter for the column name in a better way to 'my_function', so arrange() will recognize it?

lmo
  • 37,904
  • 9
  • 56
  • 69
Elyakim
  • 511
  • 1
  • 6
  • 12
  • In a simple case where you are using `arrange` in a function and want to pass a variable as a string, you can use `arrange_at` in place of `arrange`. Your case looks more complicated to me, what with `nth` and `summarise`, so using unquoting/quosures for programming may make more sense. – aosmith Nov 02 '17 at 20:00

3 Answers3

5

You need to first parse your string argument to a quosure, then unquote it with !!:

library(dplyr)
library(rlang)

target_column = 'mean_age'

my_function <- function(target_column, number){
    target_quo = parse_quosure(target_column)

    df <- read.csv('file.csv', stringsAsFactors=FALSE)
    df <- df[, c(1,4,10)]
    names(df) <-  c('place','state','mean_age')
    df1 <- df %>% group_by(state) %>% arrange(!!target_quo) 
    df1 %>% summarise(rank = nth(target_column, number))        
}

my_function('mean_age', 10)

If you want to be able to supply target_column as an unquoted column name, you can use enquo instead:

my_function <- function(target_column, number){
    target_quo = enquo(target_column)

    df <- read.csv('file.csv', stringsAsFactors=FALSE)
    df <- df[, c(1,4,10)]
    names(df) <-  c('place','state','mean_age')
    df1 <- df %>% group_by(state) %>% arrange(!!target_quo) 
    df1 %>% summarise(rank = nth(target_column, number))        
}

my_function(mean_age, 10)

Note:

Normally, enquo will also work for string arguments, but arrange itself does not allow it, so the following does not work for the second example:

my_function('mean_age', 10)

Below is a toy example to demonstrate what I mean, since OP's question is not reproducible:

library(dplyr)
library(rlang)

test_func = function(var){
    var_quo = parse_quosure(var)
    mtcars %>%
      select(!!var_quo) %>%
      arrange(!!var_quo)
}

test_func2 = function(var){
  var_quo = enquo(var)
  mtcars %>%
    select(!!var_quo) %>%
    arrange(!!var_quo)
}

Results:

> test_func("mpg") %>%
+   head()
   mpg
1 10.4
2 10.4
3 13.3
4 14.3
5 14.7
6 15.0

> test_func2(mpg) %>%
+   head()
   mpg
1 10.4
2 10.4
3 13.3
4 14.3
5 14.7
6 15.0

> test_func2("mpg") %>%
+   head()

Error in arrange_impl(.data, dots) : incorrect size (1) at position 1, expecting : 32

acylam
  • 18,231
  • 5
  • 36
  • 45
4

An update is necessary to the good answer by @avid_useR because 'rlang::parse_quosure' is deprecated now.

To give a short answer to the question how to make 'dplyr::arrange' accept a string or variable containing a string for the column name to sort, you can do:

target_column = rlang::sym('mean_age')
df %>% group_by(state) %>% arrange(!!target_column)

OR as one-liner (if you only need to use it once):

df %>% group_by(state) %>% arrange(!!rlang::sym(target_column))
Agile Bean
  • 6,437
  • 1
  • 45
  • 53
1

2022/03/17 The tidyverse has evolved and so should this answer. The tidy eval functions equo/unquo, sym/ensym, etc. are no longer the commended approach.

library("tidyverse")

# Simulate data
read_df <- function(n = 100) {
  set.seed(1234)
  tibble(
    state = sample(c("A", "B", "C"), n, replace = TRUE),
    mean_age = rnorm(n)
  )
}

Case 1. If the target column is given as a string, use the .data pronoun, ie, .data[[column_name]]).

my_function <- function(column_name, number) {
  read_df() %>%
    group_by(state) %>%
    arrange(
      # Use `across(all_of())` instead of `across()` even with a single column
      # Otherwise will get the following warning:
      # > Using an external vector in selections is ambiguous
      across(all_of(column_name))
    ) %>%
    summarise(
      rank = nth(.data[[column_name]], number)
    )
}

my_function("mean_age", 10)
#> # A tibble: 3 × 2
#>   state   rank
#>   <chr>  <dbl>
#> 1 A     -0.420
#> 2 B     -0.584
#> 3 C     -0.141

Case 2. If the target column is given as a variable, there is no need for enquo anymore! Instead enclose tidy-select expressions in double braces {{ }}, aka embrace it.

my_function <- function(column_var, number) {
  read_df() %>%
    group_by(state) %>%
    arrange(
      {{ column_var }}
    ) %>%
    summarise(
      rank = nth({{ column_var }}, number)
    )
}

my_function(mean_age, 10)
#> # A tibble: 3 × 2
#>   state   rank
#>   <chr>  <dbl>
#> 1 A     -0.420
#> 2 B     -0.584
#> 3 C     -0.141

Created on 2022-03-17 by the reprex package (v2.0.1)

dipetkov
  • 3,380
  • 1
  • 11
  • 19
  • This answer resolves the issue for me, @dipetkov, thank you. Curious to me is why `dplyr::select({{ var }}` works, but `arrange({{ var }})`, doesn’t (though your solution, with `.data[[{{ var}}]]` does work). – rdelrossi Apr 22 '23 at 22:26
  • 1
    @rdelrossi Do you mean that `var` is a string which refers to a column name? For example, `var` = "column_a". Then `select(df, {{var}})` selects the column but `arrange(df, {{var}})` doesn't sort by the column? The difference is that `select` uses [tidy-selection](https://dplyr.tidyverse.org/reference/dplyr_tidy_select.html) and `arrange` uses [data-masking](https://rlang.r-lib.org/reference/args_data_masking.html). – dipetkov Apr 22 '23 at 23:10
  • 1
    See more here: [Programming with dplyr](https://dplyr.tidyverse.org/articles/programming.html). From practical point of view, I sometimes have to try one way or the other to get it right. – dipetkov Apr 22 '23 at 23:10
  • 1
    Well said on both counts, @dipetkov. Appreciate your reply. This is likely a thread I’ll return to next time I face this. – rdelrossi Apr 23 '23 at 07:39