26

I'm writing a function where the user is asked to define one or more grouping variables in the function call. The data is then grouped using dplyr and it works as expected if there is only one grouping variable, but I haven't figured out how to do it with multiple grouping variables.

Example:

x <- c("cyl")
y <- c("cyl", "gear")
dots <- list(~cyl, ~gear)

library(dplyr)
library(lazyeval) 

mtcars %>% group_by_(x)             # groups by cyl
mtcars %>% group_by_(y)             # groups only by cyl (not gear)
mtcars %>% group_by_(.dots = dots)  # groups by cyl and gear, this is what I want.

I tried to turn y into the same as dots using:

mtcars %>% group_by_(.dots = interp(~var, var = list(y)))
#Error: is.call(expr) || is.name(expr) || is.atomic(expr) is not TRUE

How to use a user-defined input string of > 1 variable names (like y in the example) to group the data using dplyr?

(This question is somehow related to this one but not answered there.)

Community
  • 1
  • 1
talat
  • 68,970
  • 21
  • 126
  • 157

3 Answers3

23

No need for interp here, just use as.formula to convert the strings to formulas:

dots = sapply(y, . %>% {as.formula(paste0('~', .))})
mtcars %>% group_by_(.dots = dots)

The reason why your interp approach doesn’t work is that the expression gives you back the following:

~list(c("cyl", "gear"))

– not what you want. You could, of course, sapply interp over y, which would be similar to using as.formula above:

dots1 = sapply(y, . %>% {interp(~var, var = .)})

But, in fact, you can also directly pass y:

mtcars %>% group_by_(.dots = y)

The dplyr vignette on non-standard evaluation goes into more detail and explains the difference between these approaches.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • @David I wouldn’t use it. I’m just showing it to answer OP’s direct question of how to convert a character vector into a vector of formulas. That said, there *is* a difference (the formulas come with an environment attached), and in general the dplyr documentation recommends using formulas over character strings. However, in this particular case the environment is a bit useless. – Konrad Rudolph Dec 29 '14 at 12:40
  • Thanks so much for this write-up, @Konrad-Rudolph. It saved me a lot of pain. I'm having trouble understanding your sapply function... what are the "." in it? – SFuj Aug 25 '15 at 18:25
  • 1
    `group_by_` is now deprecated; you can now use `group_by_at(vars(...))`. See [this answer](https://stackoverflow.com/a/44954739/1193577) to a related question (note that the call to `one_of()` in that answer may be unnecessary). – knowah Aug 15 '19 at 14:08
2

slice_rows() from the purrrlyr package (https://github.com/hadley/purrrlyr) groups a data.frame by taking a vector of column names (strings) or positions (integers):

y <- c("cyl", "gear")
mtcars_grp <- mtcars %>% purrrlyr::slice_rows(y)

class(mtcars_grp)
#> [1] "grouped_df" "tbl_df"     "tbl"        "data.frame"

group_vars(mtcars_grp)
#> [1] "cyl"  "gear"

Particularly useful now that group_by_() has been depreciated.

wjchulme
  • 1,928
  • 1
  • 18
  • 28
1

Seems like one of these is what you want:

# one variable (as a string):
mtcars %>% group_by(.data[[x]])            # groups by cyl
# OR
mtcars %>% group_by(across(all_of(x)))     # groups by cyl

# multiple: 
mtcars %>% group_by(across(all_of(y)))     # groups cyl, gear

See: Programming with dplyr

Brian D
  • 2,570
  • 1
  • 24
  • 43