Pass column names as strings to group_by and summarize

Question

With dplyr starting version 0.7 the methods ending with underscore such as summarize_ group_by_ are deprecated since we are supposed to use quosures.

See: https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html

I am trying to implement the following example using quo and !!

Working example:

df <- data.frame(x = c("a","a","a","b","b","b"), y=c(1,1,2,2,3,3), z = 1:6)

lFG <- df %>% 
   group_by( x,y) 
lFG %>% summarize( min(z))

However, in the case, I need to implement the columns to group by and summarize are specified as strings.

cols2group <- c("x","y")
col2summarize <- "z"

How can I get the same example as above working?

score 19 · Accepted Answer · edited Jan 13 '22 at 11:26

19

For this you can now use _at versions of the verbs

df %>%  
  group_by_at(cols2group) %>% 
  summarize_at(.vars = col2summarize, .funs = min)

Edit (2021-06-09):

Please see Ronak Shah's answer, using

mutate(across(all_of(cols2summarize), min))

Now the preferred option

edited Jan 13 '22 at 11:26

xilliam

2,074
2
15
27

answered Oct 24 '17 at 19:51

Robin Gertenbach

10,316
3
25
37

score 11 · Answer 2 · answered Apr 02 '21 at 03:14

11

From dplyr 1.0.0 you can use across :

library(dplyr)

cols2group <- c("x","y")
col2summarize <- "z"

df %>%
  group_by(across(all_of(cols2group))) %>%
  summarise(across(all_of(col2summarize), min)) %>%
  ungroup

#   x       y     z
#  <chr> <dbl> <int>
#1 a         1     1
#2 a         2     3
#3 b         2     4
#4 b         3     5

answered Apr 02 '21 at 03:14

Ronak Shah

377,200
20
156
213

1

Why do you need the `all_of` inside the across? I just used it without and it works as expected. And it works for Spark! – kael Jun 24 '21 at 10:17
5

It will work as expected but it will give you a warning (once per session) `Note: Using an external vector in selections is ambiguous.ℹ Use \`all_of(cols2group)\` instead of \`cols2group\` to silence this message.` – Ronak Shah Jun 24 '21 at 10:19

score 4 · Answer 3 · answered Nov 18 '20 at 19:12

Another option is to use non-standard evaluation (NSE), and have R interpret the string as quoted names of objects:

cols2group <- c("x","y")
col2summarize <- "z"

df %>%  
  group_by(!!rlang::sym(cols2group)) %>% 
  summarize(min(!!rlang::sym(col2summarize)))

The rlang::sym() function takes the strings and turns them into quotes, which are in turn unquoted by !! and used as names in the context of df where they refer to the relevant columns. There's different ways of doing the same thing, as always, and this is the shorthand I tend to use!

score 1 · Answer 4 · answered Dec 20 '20 at 14:42

1

See ?dplyr::across for the updated way to do this since group_by_at and summarize_at are now Superseded

answered Dec 20 '20 at 14:42

Nicolas Molano

693
4
15

Pass column names as strings to group_by and summarize

4 Answers4