6

I don't know why passing argument from custom function to group_by doesn't work. I just pass a colName from dataset and when I run my own function then error comes up: Must group by variables found in .data. Column 'colName' is not found. In my example below I use quakes dataset available in R environment:

foo <- function(data, colName) {
  
  result <- data %>%
   group_by(colName) %>%
   summarise(count = n()) 

  return(result)
}

foo(quakes, "stations")

# I also tried passing w/o commas but it is not working too:
# foo(quakes, stations)

I noticed, that when I pass column name explicitly to group_by then it works:

group_by(stations) %>%

However, it doesn't make sense to hardcode column name in function..

user438383
  • 5,716
  • 8
  • 28
  • 43
mustafa00
  • 751
  • 1
  • 7
  • 28
  • All answers in this post should work - https://stackoverflow.com/questions/48219732/pass-a-string-as-variable-name-in-dplyrfilter – Ronak Shah May 04 '21 at 10:33

4 Answers4

6

Here is another way of making it work. You can use .data[[var]] construct for a column name which is stored as a string:

foo <- function(data, colName) {
  
  result <- data %>%
    group_by(.data[[colName]]) %>%
    summarise(count = n()) 
  
  return(result)
}

foo(quakes, "stations")

# A tibble: 102 x 2
   stations count
      <int> <int>
 1       10    20
 2       11    28
 3       12    25
 4       13    21
 5       14    39
 6       15    34
 7       16    35
 8       17    38
 9       18    33
10       19    29
# ... with 92 more rows

In case you decide not to pass the ColName as a string you you wrap it with a pair of curly braces inside your function to get the similar result:

foo <- function(data, colName) {
  
  result <- data %>%
    group_by({{ colName }}) %>%
    summarise(count = n()) 
  
  return(result)
}

foo(quakes, stations)

# A tibble: 102 x 2
   stations count
      <int> <int>
 1       10    20
 2       11    28
 3       12    25
 4       13    21
 5       14    39
 6       15    34
 7       16    35
 8       17    38
 9       18    33
10       19    29
# ... with 92 more rows
Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41
  • 1
    For some reasons your solution works best in my case. I write a modularized shiny app, don't know why `get(colName)` doesn't work but your solution does. Thanks – mustafa00 May 04 '21 at 10:15
  • 1
    Your welcome. `get` is a very useful function for retrieving an object's value, however, I guess if you are going to write functions using `tidyevaluation` you need to use Mr. @Peter's solution or the ones I mentioned. These are very useful and if you want to find out more you can type `vignette("programming")` in the console and read the documentations. – Anoushiravan R May 04 '21 at 10:23
4

I believe you simply need to wrap the variable name in get.

foo <- function(data, colName) {
  
  result <- data %>%
   dplyr::group_by(get(colName)) %>%
   dplyr::summarise(count = n()) 

  return(result)
}
> foo(quakes, "stations")
# A tibble: 102 x 2
   `get(colName)` count
            <int> <int>
 1             10    20
 2             11    28
 3             12    25
 4             13    21
 5             14    39
 6             15    34
 7             16    35
 8             17    38
 9             18    33
10             19    29

user438383
  • 5,716
  • 8
  • 28
  • 43
  • I develop shiny app, don't know if it's a reason but when I use `get` then the error is shown: _Problem with mutate() input ..1. x Config file config.yml not found in current working directory or parent directories i Input ..1 is get(colName)_. Anyways, your solution works fine outside shiny app. – mustafa00 May 04 '21 at 10:13
3

An option is also to use ensym and evaluate (!!) so that it can accept both quoted and unquoted argument

foo <- function(data, colName) {
       data %>%
         dplyr::group_by(!! rlang::ensym(colName)) %>%
         dplyr::summarise(count = n())
  }

foo(quakes, stations)
foo(quakes, "stations")
akrun
  • 874,273
  • 37
  • 540
  • 662
2

With dplyr try:

library(dplyr)

foo <- function(data, colName) {

  colName = sym(colName)
  
    result <- data %>%
    group_by(!!colName) %>%
    summarise(count = n()) 
  
  return(result)
}


foo(quakes, "stations")
#> # A tibble: 102 x 2
#>    stations count
#>       <int> <int>
#>  1       10    20
#>  2       11    28
#>  3       12    25
#>  4       13    21
#>  5       14    39
#>  6       15    34
#>  7       16    35
#>  8       17    38
#>  9       18    33
#> 10       19    29
#> # ... with 92 more rows

Created on 2021-05-04 by the reprex package (v2.0.0)

Peter
  • 11,500
  • 5
  • 21
  • 31