Use column index instead of name in group_by

Question

I want to summarize a dataframe with dplyr, like so:

> test <-data.frame(ID = c("A", "A", "B", "B"), val = c(1:4))
> test %>% group_by(ID) %>% summarize(av = mean(val))
# A tibble: 2 x 2
      ID    av
  <fctr> <dbl>
1      A   1.5
2      B   3.5

But suppose that instead of grouping by the column called "ID" I wish to group by the first column, regardless of its name. Is there a simple way to do that?

I've tried a few naive approaches (group_by(1), group_by(.[1]), group_by(., .[1]), group_by(names(.)[1]) to no avail. I'm only just beginning to use tidyverse packages so I may be missing something obvious.

This question is very similar, but it's about mutate and I wasn't able to generalize it to my problem. This question is also similar, but the accepted answer is to use a different package, and I'm trying to stick with dplyr.

Psidom · Accepted Answer · 2017-09-26T22:37:12.257

23

You can use one of the scoped variants (group_by_at) for this:

test %>% group_by_at(1) %>% summarise(av = mean(val))

# A tibble: 2 x 2
#      ID    av
#  <fctr> <dbl>
#1      A   1.5
#2      B   3.5

edited Sep 26 '17 at 22:37

answered Sep 26 '17 at 22:27

Psidom

209,562
33
339
356

score 14 · Answer 2 · answered Jun 17 '20 at 16:28

14

You can use the across functionality as of version 1.0.0:

library(dplyr)
test %>% 
  group_by(across(1)) %>% 
  summarise(av = mean(val))
## A tibble: 2 x 2
#  ID       av
#  <fct> <dbl>
#1 A       1.5
#2 B       3.5

answered Jun 17 '20 at 16:28

Ian Campbell

23,484
14
36
57

score 4 · Answer 3 · edited Jun 17 '20 at 16:26

4

In older versions of dpylyr, You could use standard evaluation with dplyr::group_by_:

test %>% 
 group_by_(names(.)[1]) %>% 
 summarize(av = mean(val))
## A tibble: 2 x 2
#      ID    av
#  <fctr> <dbl>
#1      A   1.5
#2      B   3.5

edited Jun 17 '20 at 16:26

Gregor Thomas

136,190
20
167
294

answered Sep 26 '17 at 22:05

LyzandeR

37,047
12
77
87

Standard evaluation is now [deprecated](http://dplyr.tidyverse.org/reference/se-deprecated.html). – Dan Sep 27 '17 at 00:05

score 3 · Answer 4 · answered Sep 27 '17 at 03:58

If we need to use NSE, then sym and !! can be used

test %>%
     group_by(!! rlang::sym(names(.)[1])) %>%
     summarise(av = mean(val))
# A tibble: 2 x 2
#      ID    av
#  <fctr> <dbl>
#1      A   1.5
#2      B   3.5

We can also create a function. If we pass quoted strings, then we use sym with !! or else go for the enquo/!! route

f1 <- function(dat, grp, valueCol) {
     dat %>%
        group_by(!! rlang::sym(grp)) %>%
        summarise(av = mean(!! rlang::sym(valueCol)))
}

f1(test, "ID", "val")
# A tibble: 2 x 2
#      ID    av
#  <fctr> <dbl>
#1      A   1.5
#2      B   3.5

This is more complicated than I actually need for my problem, but it started me down a fruitful learning path about evaluation and [programming with dplyr](http://dplyr.tidyverse.org/articles/programming.html) — Joe, Sep 27 '17 at 15:48

score 0 · Answer 5 · answered Aug 25 '23 at 06:56

As group_by_at has now been superseded, it is now better to use either across or pick. Use of across has already been shown. For pick use this syntax

library(dplyr)
test %>% 
  group_by(pick(1)) %>% 
  summarise(av = mean(val))
## A tibble: 2 x 2
#  ID       av
#  <fct> <dbl>
#1 A       1.5
#2 B       3.5

More information on this super helpful pick can be seen here.

Use column index instead of name in group_by

5 Answers5

Linked