0

Somewhat hard to define this question without sounding like lots of similar questions!

I have a function for which I want one of the parameters to be a function name, that will be passed to dplyr::summarise, e.g. "mean" or "sum":

data(mtcars)
  f <- function(x = mtcars,
                groupcol = "cyl",
                zCol = "disp",
                zFun = "mean") {
    
    zColquo = quo_name(zCol)
    
    cellSummaries <- x %>%
      group_by(gear, !!sym(groupcol)) %>% # 1 preset grouper, 1 user-defined
      summarise(Count = n(), # 1 preset summary, 1 user defined
                !!zColquo := mean(!!sym(zColquo))) # mean should be zFun, user-defined
    ungroup
  }

(this groups by gear and cyl, then returns, per group, count and mean(disp))

Per my note, I'd like 'mean' to be dynamic, performing the function defined by zFun, but I can't for the life of me work out how to do it! Thanks in advance for any advice.

dez93_2000
  • 1,730
  • 2
  • 23
  • 34

2 Answers2

1

You can use match.fun to make the function dynamic. I also removed zColquo as it's not needed.

library(dplyr)
library(rlang)

f <- function(x = mtcars,
              groupcol = "cyl",
              zCol = "disp",
              zFun = "mean") {

  cellSummaries <- x %>%
                   group_by(gear, !!sym(groupcol)) %>% 
                   summarise(Count = n(), 
                             !!zCol := match.fun(zFun)(!!sym(zCol))) %>%
                   ungroup

  return(cellSummaries)
}

You can then check output

f()

# A tibble: 8 x 4
#   gear   cyl Count  disp
#  <dbl> <dbl> <int> <dbl>
#1     3     4     1  120.
#2     3     6     2  242.
#3     3     8    12  358.
#4     4     4     8  103.
#5     4     6     4  164.
#6     5     4     2  108.
#7     5     6     1  145 
#8     5     8     2  326 

f(zFun = "sum")

# A tibble: 8 x 4
#   gear   cyl Count  disp
#  <dbl> <dbl> <int> <dbl>
#1     3     4     1  120.
#2     3     6     2  483 
#3     3     8    12 4291.
#4     4     4     8  821 
#5     4     6     4  655.
#6     5     4     2  215.
#7     5     6     1  145 
#8     5     8     2  652 
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

We can use get

library(dplyr)    
f <- function(x = mtcars,
            groupcol = "cyl",
            zCol = "disp",
            zFun = "mean") {


  zColquo = quo_name(zCol)
  x %>%
  group_by(gear, !!sym(groupcol)) %>% # 1 preset grouper, 1 user-defined
  summarise(Count = n(), # 1 preset summary, 1 user defined
            !!zColquo := get(zFun)(!!sym(zCol))) %>% 
ungroup
 }

f()
# A tibble: 8 x 4
#   gear   cyl Count  disp
#  <dbl> <dbl> <int> <dbl>
#1     3     4     1  120.
#2     3     6     2  242.
#3     3     8    12  358.
#4     4     4     8  103.
#5     4     6     4  164.
#6     5     4     2  108.
#7     5     6     1  145 
#8     5     8     2  326 


f(zFun = "sum")
# A tibble: 8 x 4
#   gear   cyl Count  disp
#  <dbl> <dbl> <int> <dbl>
#1     3     4     1  120.
#2     3     6     2  483 
#3     3     8    12 4291.
#4     4     4     8  821 
#5     4     6     4  655.
#6     5     4     2  215.
#7     5     6     1  145 
#8     5     8     2  652 

In addition, we could remove the sym evaluation in group_by and in summarise if we wrap with across

f <- function(x = mtcars,
            groupcol = "cyl",
            zCol = "disp",
            zFun = "mean") {



 x %>%
    group_by(across(c(gear, groupcol))) %>% # 1 preset grouper, 1 user-defined
    summarise(Count = n(), # 1 preset summary, 1 user defined
            across(zCol, ~ get(zFun)(.))) %>% 
    ungroup
 }
f()
# A tibble: 8 x 4
#   gear   cyl Count  disp
#  <dbl> <dbl> <int> <dbl>
#1     3     4     1  120.
#2     3     6     2  242.
#3     3     8    12  358.
#4     4     4     8  103.
#5     4     6     4  164.
#6     5     4     2  108.
#7     5     6     1  145 
#8     5     8     2  326 
akrun
  • 874,273
  • 37
  • 540
  • 662
  • thanks! Any reason that match.fun() is better or worse than get()? – dez93_2000 Jul 15 '20 at 01:46
  • 1
    @dez93_2000 You could pass unquoted `sum` or `mean` in `zFun` and then you don't need any other function i.e. `zFun(!!sym(zCol)))` should work – akrun Jul 15 '20 at 01:49
  • 1
    thanks again. Never thought of using get() for functions! Your across() approach is interesting - I'm still getting my head around that new implementation. – dez93_2000 Jul 15 '20 at 02:21
  • 1
    @dez93_2000 thanks. With `across`, you can pass multiple functions, to blocks of columns, and by default, it will return with the same column name. Thus, some of the evaluations can be simplified – akrun Jul 15 '20 at 02:23
  • 1
    That was what most appealed when I heard about it; what's intriguing me is 1. why does using across in group_by obviate the need to !!sym() the groupcol, and 2. that second use of across is top tier wizardry! Is the tilde purrr style? Is seems like across(zCol, get(zFun)) should work?? – dez93_2000 Jul 15 '20 at 02:40
  • 1
    @dez93_2000 In `summarise` you can use multiple columns like `summarise(across(c(Zcol, Zcol2), ~ get(zFun)(.))`. The `~` is for lambda function `function(x)` – akrun Jul 15 '20 at 02:42
  • 1
    @dez93_2000 yes, it would work without the `~`. I used it just to make it more flexible in case you want to add more parameters and it is more readable – akrun Jul 15 '20 at 02:44