How to group by all but one columns?

Question

How do I tell group_by to group the data by all columns except a given one?

With aggregate, it would be aggregate(x ~ ., ...).

I tried group_by(data, -x), but that groups by the negative-of-x (i.e. the same as grouping by x).

http://stackoverflow.com/a/28182288/4604054 is one way to do this, its quite long winded in comparison to aggregate. — zacdav, Aug 27 '16 at 12:41

score 100 · Accepted Answer · edited Dec 03 '20 at 02:54

100

dplyr version 1.0+

In dplyr 1.0.0 coming up, the _at functions are falling into the superseded lifecycle (i.e. while they remain in dplyr for the foreseeable future, there are now better alternatives that are more actively developed). The new way to accomplish this is via the across function:

df %>%
  group_by(across(c(-hp)))

dplyr v 0.7+

A small update on this question because I stumbled across this myself and found an elegant solution with current version of dplyr (0.7.4): Inside group_by_at(), you can supply the names of columns the same way as in the select() function using vars(). This enables us to group by everything but one column (hp in this example) by writing:

library(dplyr)
df <- as_tibble(mtcars, rownames = "car")
df %>% group_by_at(vars(-hp))

edited Dec 03 '20 at 02:54

Gregor Thomas

136,190
20
167
294

answered Mar 18 '18 at 14:24

Jannik Buhr

1,788
2
11
11

4

You may even supply several columns to ignore: `df %>% group_by_at(vars(-hp, -cyl))` without needing to use the c() construct. Super nice! – Lionel Trebuchon Jun 03 '19 at 13:43
5

it even works `group_by(across(!hp)`. Would be great if this could be marked as the answer now. – s_pike Oct 09 '20 at 06:18

score 40 · Answer 2 · edited Aug 17 '17 at 16:19

Building on the @eipi10's dplyr 0.7.0 edit, group_by_at appears to be the right function for this job. However, if you are simply looking to omit column "x", then you can use:

new2.0 <- dat %>%
  group_by_at(vars(-x)) %>%
  summarize(mean_value = mean(value))

Using @eipi10's example data:

# Fake data
set.seed(492)
dat <- data.frame(value = rnorm(1000),
             g1 = sample(LETTERS, 1000, replace = TRUE),
             g2 = sample(letters, 1000, replace = TRUE),
             g3 = sample(1:10, replace = TRUE),
             other = sample(c("red", "green", "black"), 1000, replace = TRUE))

new <- dat %>% 
  group_by_at(names(dat)[-grep("value", names(dat))]) %>%
  summarise(meanValue = mean(value))


new2.0 <- dat %>% 
  group_by_at(vars(-value)) %>% 
  summarize(meanValue = mean(value))

identical(new, new2.0)
# [1] TRUE

score 35 · Answer 3 · edited Jan 17 '20 at 14:47

You can do this using standard evaluation (group_by_ instead of group_by):

# Fake data
set.seed(492)
dat = data.frame(value=rnorm(1000), g1=sample(LETTERS,1000,replace=TRUE),
                 g2=sample(letters,1000,replace=TRUE), g3=sample(1:10, replace=TRUE),
                 other=sample(c("red","green","black"),1000,replace=TRUE))

dat %>% group_by_(.dots=names(dat)[-grep("value", names(dat))]) %>%
  summarise(meanValue=mean(value))

       g1     g2    g3  other   meanValue
   <fctr> <fctr> <int> <fctr>       <dbl>
1       A      a     2  green  0.89281475
2       A      b     2    red -0.03558775
3       A      b     5  black -1.79184218
4       A      c    10  black  0.17518610
5       A      e     5  black  0.25830392
...

See this vignette for more on standard vs. non-standard evaluation in dplyr.

UPDATE for `dplyr` 0.7.0

To address @ÖmerAn's comment: It looks like group_by_at is the way to go in dplyr 0.7.0 (someone please correct me if I'm wrong about this). For example:

dat %>% 
  group_by_at(setdiff(names(dat), "value")) %>%
  summarise(meanValue=mean(value))

# Groups:   g1, g2, g3 [?]
       g1     g2    g3  other   meanValue
   <fctr> <fctr> <int> <fctr>       <dbl>
 1      A      a     2  green  0.89281475
 2      A      b     2    red -0.03558775
 3      A      b     5  black -1.79184218
 4      A      c    10  black  0.17518610
 5      A      e     5  black  0.25830392
 6      A      e     5    red -0.81879788
 7      A      e     7  green  0.30836054
 8      A      f     2  green  0.05537047
 9      A      g     1  black  1.00156405
10      A      g    10  black  1.26884303
# ... with 949 more rows

Let's confirm both methods give the same output (in dplyr 0.7.0):

new = dat %>% 
  group_by_at(setdiff(names(dat), "value")) %>%
  summarise(meanValue=mean(value))

old = dat %>% 
  group_by_(.dots=names(dat)[-grep("value", names(dat))]) %>%
  summarise(meanValue=mean(value))

identical(old, new)
# [1] TRUE

Instead of `names(dat)[-grep("value", names(dat))]`, you can also use `setdiff(names(dat), "value")` — Jaap, Aug 28 '16 at 17:16
Although it's risker, I guess you could even select by position: `names(dat)[-1]`. We're golfing, right? :) — eipi10, Aug 28 '16 at 17:30
Also, if you want to summarise all columns at once use `dplyr::summarise_each(funs(mean))` — Boern, Apr 24 '17 at 08:28
How to do this in the newest version of `dplyr 0.7.0` which claims the SE versions of the verbs are now deprecated? https://github.com/tidyverse/dplyr/releases — Ömer An, Jun 22 '17 at 06:54
As @eipi10 pointed out, is there an advantage to `group_by_at` over chaining `select(-value) %>% group_by()`? I thought group_by_at might keep the excluded column, but it drops the excluded column. — CPak, Jun 22 '17 at 20:23

How to group by all but one columns?

3 Answers3

dplyr version 1.0+

dplyr v 0.7+

UPDATE for `dplyr` 0.7.0

Linked

Related

How to group by all but one columns?

3 Answers3

dplyr version 1.0+

dplyr v 0.7+

UPDATE for dplyr 0.7.0

Linked

Related

UPDATE for `dplyr` 0.7.0