24

The release of dplyr 0.7 includes a major overhaul of programming with dplyr. I read this document carefully, and I am trying to understand how it will impact my use of dplyr.

Here is a common idiom I use when building reporting and aggregation functions with dplyr:

my_report <- function(data, grouping_vars) {
  data %>%
    group_by_(.dots=grouping_vars) %>%
    summarize(x_mean=mean(x), x_median=median(x), ...)
}

Here, grouping_vars is a vector of strings.

I like this idiom because I can pass in string vectors from other places, say a file or a Shiny app's reactive UI, but it's also not too bad for interactive work either.

However, in the new programming with dplyr vignette, I see no examples of how something like this can be done with the new dplyr. I only see examples of how passing strings is no longer the correct approach, and I have to use quosures instead.

I'm happy to adopt quosures, but how exactly do I get from strings to the quosures expected by dplyr here? It doesn't seem feasible to expect the entire R ecosystem to provide quosures to dplyr - lots of times we're going to get strings and they'll have to be converted.

Here is an example showing what you're now supposed to do, and how my old idiom doesn't work:

library(dplyr)
grouping_vars <- quo(am)
mtcars %>%
  group_by(!!grouping_vars) %>%
  summarise(mean_cyl=mean(cyl))
#> # A tibble: 2 × 2
#>      am mean_cyl
#>   <dbl>    <dbl>
#> 1     0 6.947368
#> 2     1 5.076923

grouping_vars <- "am"
mtcars %>%
  group_by(!!grouping_vars) %>%
  summarise(mean_cyl=mean(cyl))
#> # A tibble: 1 × 2
#>   `"am"` mean_cyl
#>    <chr>    <dbl>
#> 1     am   6.1875
Paul
  • 3,321
  • 1
  • 33
  • 42
  • 1
    It's easier to help if you provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data that can be used to test possible solutions. – MrFlick Apr 14 '17 at 16:31
  • @MrFlick I just tried setting one up, but in the process I found I couldn't even get the quosure-based example to work. So I filed a github issue https://github.com/tidyverse/dplyr/issues/2661 – Paul Apr 14 '17 at 16:55
  • Got a response and updated my post accordingly. – Paul Apr 14 '17 at 17:03

3 Answers3

13

dplyr will have a specialized group_by function group_by_at to deal with multiple grouping variables. It would be much easier to use the new member of the _at family:

# using the pre-release 0.6.0

cols <- c("am","gear")

mtcars %>%
    group_by_at(.vars = cols) %>%
    summarise(mean_cyl=mean(cyl))

# Source: local data frame [4 x 3]
# Groups: am [?]
# 
# am  gear mean_cyl
# <dbl> <dbl>    <dbl>
# 1     0     3 7.466667
# 2     0     4 5.000000
# 3     1     4 4.500000
# 4     1     5 6.000000

The .vars argument accepts both character/numeric vector or column names generated by vars:

.vars

A list of columns generated by vars(), or a character vector of column names, or a numeric vector of column positions.

Community
  • 1
  • 1
mt1022
  • 16,834
  • 5
  • 48
  • 71
  • Switching my checkmark since this looks like it will be the standard way going forward. – Paul Apr 14 '17 at 18:07
11

Here's the quick and dirty reference I wrote for myself.

# install.packages("rlang")
library(tidyverse)

dat <- data.frame(cat = sample(LETTERS[1:2], 50, replace = TRUE),
                  cat2 = sample(LETTERS[3:4], 50, replace = TRUE),
                  value = rnorm(50))

Representing column names with strings

Convert strings to symbol objects using rlang::sym and rlang::syms.

summ_var <- "value"
group_vars <- c("cat", "cat2")

summ_sym <- rlang::sym(summ_var)  # capture a single symbol
group_syms <- rlang::syms(group_vars)  # creates list of symbols

dat %>%
  group_by(!!!group_syms) %>%  # splice list of symbols into a function call
  summarize(summ = sum(!!summ_sym)) # slice single symbol into call

If you use !! or !!! outside of dplyr functions you will get an error.

The usage of rlang::sym and rlang::syms is identical inside functions.

summarize_by <- function(df, summ_var, group_vars) {

  summ_sym <- rlang::sym(summ_var)
  group_syms <- rlang::syms(group_vars)

  df %>%
    group_by(!!!group_syms) %>%
    summarize(summ = sum(!!summ_sym))
}

We can then call summarize_by with string arguments.

summarize_by(dat, "value", c("cat", "cat2"))

Using non-standard evaluation for column/variable names

summ_quo <- quo(value)  # capture a single variable for NSE
group_quos <- quos(cat, cat2)  # capture list of variables for NSE

dat %>%
  group_by(!!!group_quos) %>%  # use !!! with both quos and rlang::syms
  summarize(summ = sum(!!summ_quo))  # use !! both quo and rlang::sym

Inside functions use enquo rather than quo. quos is okay though!?

summarize_by <- function(df, summ_var, ...) {

  summ_quo <- enquo(summ_var)  # can only capture a single value!
  group_quos <- quos(...)  # captures multiple values, also inside functions!?

  df %>%
    group_by(!!!group_quos) %>%
    summarize(summ = sum(!!summ_quo))
}

And then our function call is

summarize_by(dat, value, cat, cat2)
alexpghayes
  • 673
  • 5
  • 17
6

If you want to group by possibly more than one column, you can use quos

grouping_vars <- quos(am, gear)
mtcars %>%
  group_by(!!!grouping_vars) %>%
  summarise(mean_cyl=mean(cyl))
#      am  gear mean_cyl
#   <dbl> <dbl>    <dbl>
# 1     0     3 7.466667
# 2     0     4 5.000000
# 3     1     4 4.500000
# 4     1     5 6.000000

Right now, it doesn't seem like there's a great way to turn strings into quos. Here's one way that does work though

cols <- c("am","gear")
grouping_vars <- rlang::parse_quosures(paste(cols, collapse=";"))
mtcars %>%
  group_by(!!!grouping_vars) %>%
  summarise(mean_cyl=mean(cyl))
#      am  gear mean_cyl
#   <dbl> <dbl>    <dbl>
# 1     0     3 7.466667
# 2     0     4 5.000000
# 3     1     4 4.500000
# 4     1     5 6.000000
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • 2
    @joran on my end, your version without `rlang::parse_quosures` yields a groupby with only `am`, not `am` and `gear` together. Is it different for you? – Paul Apr 14 '17 at 17:13
  • 1
    @Paul No, you're both right, I wasn't looking carefully. – joran Apr 14 '17 at 17:15
  • 1
    @MrFlick this wouldn't be that bad if the parse_quosures boilerplate were wrapped in a function: `as_quosure <- function(strs) rlang::parse_quosures(paste(strs, collapse=";"))`. Then you can just do `group_by(!!!as_quosure(cols))` – Paul Apr 14 '17 at 17:18
  • 2
    @Paul Or I just wish `rlang::parse_quosures` could take a proper vector rather than having to collapse. I'd be surprised if there weren't a good convenience function forthcoming – MrFlick Apr 14 '17 at 17:19
  • 2
    @Paul Indeed, as I was wrapping my head around this I was just thinking that this new method seems _awfully_ targeted at writing functions that take bare column names as arguments. Feels oddly focused on writing _interactive_ functions rather than functions generally. – joran Apr 14 '17 at 17:21
  • 4
    What I liked about the old dplyr function was that they translated the NSE versions to SE versions which are much easier to work with/program against. This new version seems to go all-in on non-standard craziness. – MrFlick Apr 14 '17 at 17:22
  • @joran MrFlick That's what I thought at first too, but I think hadley's genius idea is that the robustness of the new NSE/quosure framework is supposed to make it easy to "use NSE for SE". This example is a good initial kick of the wheels - hopefully it will continue to hold up. I'm going to raise this in a github issue. – Paul Apr 14 '17 at 17:25
  • 2
    @Paul I was so annoyed with my answer that I did the same: https://github.com/tidyverse/dplyr/issues/2662 – MrFlick Apr 14 '17 at 17:30
  • `syms` may be helpful, e.g. `(function(data, ...){ data %>% group_by(!!!rlang::syms(list(...))) %>% summarize(mpg = mean(mpg)) })(mtcars, 'cyl', 'am')` – alistaire Apr 14 '17 at 19:21
  • @alistaire. Yeah, that's what Hadley pointed out on the github issue. But I think `group_by_at` is a "better" solution. – MrFlick Apr 14 '17 at 19:23
  • 1
    Oops, clicked too late. I still can't find one that goes straight from dots of strings to quosures/symbols without `list`, though. – alistaire Apr 14 '17 at 19:25