6

I'm dealing with a big dataframe that has a number of columns I want to group by. I'd like to do something like this:

output <- df %>% 
  group_by(starts_with("GEN", ignore.case=TRUE),x,y) %>% 
  summarize(total=n()) %>% 
  arrange(desc(total))

is there a way to do this? Maybe with group_by_at or some other similar function?

zx8754
  • 52,746
  • 12
  • 114
  • 209
Andrew P
  • 63
  • 4
  • 1
    Did you try it with `group_by_at`? If so, what happened? Without a [reproducible example](https://stackoverflow.com/q/5963269/5325862) it's hard to do better than guess. Something I have to remind myself often is that nothing bad will happen if I try writing code that doesn't end up working—it seems like you have a good guess, might as well try it out – camille Sep 28 '20 at 19:04

2 Answers2

8

To use starts_with() in group_by(), you need to wrap it in across(). Here is an example using some built data.

library(dplyr)
mtcars %>%
group_by(across(starts_with("c"))) %>%
summarize(total = n()) %>%
arrange(-total)

# A tibble: 9 x 3
# Groups:   cyl [3]
    cyl  carb total
  <dbl> <dbl> <int>
1     4     2     6
2     8     4     6
3     4     1     5
4     6     4     4
5     8     2     4
6     8     3     3
7     6     1     2
8     6     6     1
9     8     8     1
Ben Norris
  • 5,639
  • 2
  • 6
  • 15
3

Yes, there is. You could use the group_by_at function:

mtcars %>% group_by_at(vars(starts_with("c"), gear))

Group by all columns whose name starts with "c" and by the column gear

Output

# A tibble: 32 x 11
# Groups:   cyl, carb, gear [12]
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
 4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
 6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
 7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
 8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
 9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
# ... with 22 more rows
smichal
  • 141
  • 4