16

I want to summarize a dataframe with dplyr, like so:

> test <-data.frame(ID = c("A", "A", "B", "B"), val = c(1:4))
> test %>% group_by(ID) %>% summarize(av = mean(val))
# A tibble: 2 x 2
      ID    av
  <fctr> <dbl>
1      A   1.5
2      B   3.5

But suppose that instead of grouping by the column called "ID" I wish to group by the first column, regardless of its name. Is there a simple way to do that?

I've tried a few naive approaches (group_by(1), group_by(.[1]), group_by(., .[1]), group_by(names(.)[1]) to no avail. I'm only just beginning to use tidyverse packages so I may be missing something obvious.

This question is very similar, but it's about mutate and I wasn't able to generalize it to my problem. This question is also similar, but the accepted answer is to use a different package, and I'm trying to stick with dplyr.

Joe
  • 3,831
  • 4
  • 28
  • 44

5 Answers5

23

You can use one of the scoped variants (group_by_at) for this:

test %>% group_by_at(1) %>% summarise(av = mean(val))

# A tibble: 2 x 2
#      ID    av
#  <fctr> <dbl>
#1      A   1.5
#2      B   3.5
Psidom
  • 209,562
  • 33
  • 339
  • 356
14

You can use the across functionality as of version 1.0.0:

library(dplyr)
test %>% 
  group_by(across(1)) %>% 
  summarise(av = mean(val))
## A tibble: 2 x 2
#  ID       av
#  <fct> <dbl>
#1 A       1.5
#2 B       3.5
Ian Campbell
  • 23,484
  • 14
  • 36
  • 57
4

In older versions of dpylyr, You could use standard evaluation with dplyr::group_by_:

test %>% 
 group_by_(names(.)[1]) %>% 
 summarize(av = mean(val))
## A tibble: 2 x 2
#      ID    av
#  <fctr> <dbl>
#1      A   1.5
#2      B   3.5
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
LyzandeR
  • 37,047
  • 12
  • 77
  • 87
  • Standard evaluation is now [deprecated](http://dplyr.tidyverse.org/reference/se-deprecated.html). – Dan Sep 27 '17 at 00:05
3

If we need to use NSE, then sym and !! can be used

test %>%
     group_by(!! rlang::sym(names(.)[1])) %>%
     summarise(av = mean(val))
# A tibble: 2 x 2
#      ID    av
#  <fctr> <dbl>
#1      A   1.5
#2      B   3.5

We can also create a function. If we pass quoted strings, then we use sym with !! or else go for the enquo/!! route

f1 <- function(dat, grp, valueCol) {
     dat %>%
        group_by(!! rlang::sym(grp)) %>%
        summarise(av = mean(!! rlang::sym(valueCol)))
}

f1(test, "ID", "val")
# A tibble: 2 x 2
#      ID    av
#  <fctr> <dbl>
#1      A   1.5
#2      B   3.5
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    This is more complicated than I actually need for my problem, but it started me down a fruitful learning path about evaluation and [programming with dplyr](http://dplyr.tidyverse.org/articles/programming.html) – Joe Sep 27 '17 at 15:48
0

As group_by_at has now been superseded, it is now better to use either across or pick. Use of across has already been shown. For pick use this syntax

library(dplyr)
test %>% 
  group_by(pick(1)) %>% 
  summarise(av = mean(val))
## A tibble: 2 x 2
#  ID       av
#  <fct> <dbl>
#1 A       1.5
#2 B       3.5

More information on this super helpful pick can be seen here.

AnilGoyal
  • 25,297
  • 4
  • 27
  • 45