3

I want to define a custom function which groups and summarises some data using dplyr, and conditional on a Boolean flag can group by an additional level. I can achieve this using a full if... else control block as in this trivial example:

library(tidyverse)
data(Titanic)

Titanic <- as_tibble(Titanic)

foo <- function(by_age = FALSE) {
  if (by_age) {
    bar <- Titanic %>%
      group_by(Survived, Age)
  } else {
    bar <- Titanic %>%
      group_by(Survived)
  }
  
  bar %>%
    summarise(n = sum(n))
}

foo()
foo(by_age = TRUE)

But this seems a very clumsy way round. Is there a way I can achieve this with a single block of dplyr code, conditionally calling Age as a second grouping variable? I've tried with ifelse(by_age, Age, NA) in my group_by statement, and some of the techniques listed in this SO post but to no avail.

AnilGoyal
  • 25,297
  • 4
  • 27
  • 45
Tom Wagstaff
  • 1,443
  • 2
  • 13
  • 15

3 Answers3

4

Edit

Sorry, I didn't read your linked SO post; if you want to avoid the ... approach for some reason, this is one potential solution:

library(tidyverse)
data(Titanic)

Titanic <- as_tibble(Titanic)

foo <- function(by_age = FALSE) {
  Titanic %>%
    group_by(Survived, if(by_age) Age) %>%
    summarise(n = sum(n))
}

foo()
#> # A tibble: 2 × 2
#>   Survived     n
#>   <chr>    <dbl>
#> 1 No        1490
#> 2 Yes        711
foo(by_age = TRUE)
#> `summarise()` has grouped output by 'Survived'. You can override using the
#> `.groups` argument.
#> # A tibble: 4 × 3
#> # Groups:   Survived [2]
#>   Survived `if (by_age) Age`     n
#>   <chr>    <chr>             <dbl>
#> 1 No       Adult              1438
#> 2 No       Child                52
#> 3 Yes      Adult               654
#> 4 Yes      Child                57

Created on 2022-07-07 by the reprex package (v2.0.1)

To avoid the "Age" column being called "if (by_age) Age" you can use:

library(tidyverse)
data(Titanic)

Titanic <- as_tibble(Titanic)

foo <- function(by_age = FALSE) {
  Titanic %>%
    group_by(Survived, !!sym(ifelse(by_age, "Age", ""))) %>%
    summarise(n = sum(n))
}

foo()
#> # A tibble: 2 × 2
#>   Survived     n
#>   <chr>    <dbl>
#> 1 No        1490
#> 2 Yes        711
foo(by_age = TRUE)
#> `summarise()` has grouped output by 'Survived'. You can override using the
#> `.groups` argument.
#> # A tibble: 4 × 3
#> # Groups:   Survived [2]
#>   Survived Age       n
#>   <chr>    <chr> <dbl>
#> 1 No       Adult  1438
#> 2 No       Child    52
#> 3 Yes      Adult   654
#> 4 Yes      Child    57

Created on 2022-07-07 by the reprex package (v2.0.1)

Original answer

One solution is to use ... (dot-dot-dot) to pass in the argument if/when you want, e.g.

library(tidyverse)
data(Titanic)

Titanic <- as_tibble(Titanic)

foo <- function(...) {
  Titanic %>%
      group_by(Survived, ...) %>%
    summarise(n = sum(n))
}

foo()
#> # A tibble: 2 × 2
#>   Survived     n
#>   <chr>    <dbl>
#> 1 No        1490
#> 2 Yes        711
foo(Age)
#> `summarise()` has grouped output by 'Survived'. You can override using the
#> `.groups` argument.
#> # A tibble: 4 × 3
#> # Groups:   Survived [2]
#>   Survived Age       n
#>   <chr>    <chr> <dbl>
#> 1 No       Adult  1438
#> 2 No       Child    52
#> 3 Yes      Adult   654
#> 4 Yes      Child    57

# You can also pass in multiple 'extra' arguments
foo(Age, Sex)
#> `summarise()` has grouped output by 'Survived', 'Age'. You can override using
#> the `.groups` argument.
#> # A tibble: 8 × 4
#> # Groups:   Survived, Age [4]
#>   Survived Age   Sex        n
#>   <chr>    <chr> <chr>  <dbl>
#> 1 No       Adult Female   109
#> 2 No       Adult Male    1329
#> 3 No       Child Female    17
#> 4 No       Child Male      35
#> 5 Yes      Adult Female   316
#> 6 Yes      Adult Male     338
#> 7 Yes      Child Female    28
#> 8 Yes      Child Male      29

Created on 2022-07-07 by the reprex package (v2.0.1)

NB: Using ... comes with two downsides:

  • When you use it to pass arguments to another function, you have to carefully explain to the user where those arguments go. This makes it hard to understand what you can do with functions like lapply() and plot().
  • A misspelled argument will not raise an error. This makes it easy for typos to go unnoticed (from Advanced R; https://adv-r.hadley.nz/functions.html?q=...#fun-dot-dot-dot)
jared_mamrot
  • 22,354
  • 4
  • 21
  • 46
  • It's a bit of a shame in your new solution to have the `Age` column in the summary renamed to `if (by_age) Age` – Richard Berry Jul 07 '22 at 10:45
  • Yes I thought it was a good answer, but in OP's post they said they 'tried the techniques listed in this SO post' and the dot-dot-dot approach was one of the answers in the link, so I assume OP doesn't want to use it for some reason. Regardless, it's up to OP to decide what is going to suit their use-case – jared_mamrot Jul 07 '22 at 10:52
  • Fixed the "`Age` column in the summary renamed to `if (by_age) Age`" problem @RichardBerry - thanks for pointing that out to me – jared_mamrot Jul 07 '22 at 11:19
4

You can do using curly-curly ({{}}) from rlang package and pass the additional group variable as NULL

library(dplyr)
library(rlang)

data(Titanic)

Titanic <- as_tibble(Titanic)

foo <- function(grp = NULL) {
  Titanic %>%
    group_by(Survived, {{grp}}) %>%
    summarise(n = sum(n))
}

foo()
#> # A tibble: 2 × 2
#>   Survived     n
#>   <chr>    <dbl>
#> 1 No        1490
#> 2 Yes        711

foo(Age)
#> `summarise()` has grouped output by 'Survived'. You can override using the
#> `.groups` argument.
#> # A tibble: 4 × 3
#> # Groups:   Survived [2]
#>   Survived Age       n
#>   <chr>    <chr> <dbl>
#> 1 No       Adult  1438
#> 2 No       Child    52
#> 3 Yes      Adult   654
#> 4 Yes      Child    57

Created on 2022-07-07 by the reprex package (v2.0.1)

shafee
  • 15,566
  • 3
  • 19
  • 47
2

One approach is to split the group_by into two group_by statements.

foo <- function(by_age = FALSE) {
  Titanic %>%
    group_by(Survived) %>%
    { if (by_age) group_by(., Age, .add = TRUE) else . } %>%
    summarise(n = sum(n), .groups = "drop")
}

giving:

foo()
## # A tibble: 2 x 2
##   Survived     n
##   <chr>    <dbl>
## 1 No        1490
## 2 Yes        711

foo(TRUE)
## # A tibble: 4 x 3
##   Survived Age       n
##   <chr>    <chr> <dbl>
## 1 No       Adult  1438
## 2 No       Child    52
## 3 Yes      Adult   654
## 4 Yes      Child    57
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341