Can dplyr summarise over several variables without listing each one?

Question

dplyr is amazingly fast, but I wonder if I'm missing something: is it possible summarise over several variables. For example:

library(dplyr)
library(reshape2)

df <- data.frame(
  sex = factor(rep(c("boy", "girl"), each = 2L)),
  age = c(52L, 58L, 40L, 62L),
  bmi = c(25L, 23L, 30L, 26L),
  chol = c(187L, 220L, 190L, 204L)
)
df

   sex age bmi chol
1  boy  52  25  187
2  boy  58  23  220
3 girl  40  30  190
4 girl  62  26  204

dg=group_by(df,sex)

With this small dataframe, it's easy to write

summarise(dg,mean(age),mean(bmi),mean(chol))

And I know that to get what I want, I could melt, get the means, and then dcast such as

dm=melt(df, id.var='sex')
dmg=group_by(dm, sex, variable); 
x=summarise(dmg, means=mean(value))
dcast(x, sex~variable)

But what if I have >20 variables and a very large number of rows. Is there anything similar to .SD in data.table that would allow me to take the means of all variables in the grouped data frame? Or, is it possible to somehow use lapply on the grouped data frame?

Thanks for any help

I think the `data.table` solution will be the fastest and the more efficient here. But you can have a nice "`reshape2` only" solution : `dcast(melt(df, id = "sex"), sex ~ variable, fun.aggregate = mean)` — dickoa, Jan 23 '14 at 00:06

rrs · Answer 1 · 2021-08-20T20:59:18.943

120

As has been mentioned by several folks, mutate_each() and summarise_each() are deprecated in favour of the new across() function.

Answer as of dplyr version 1.0.5:

df %>%
  group_by(sex) %>%
  summarise(across(everything(), mean))

Original answer:

dplyr now has summarise_each:

df %>% 
  group_by(sex) %>% 
  summarise_each(funs(mean))

edited Aug 20 '21 at 20:59

answered Jun 27 '14 at 15:25

rrs

9,615
4
28
38

1

Version update of `summarise_each` alternatives can be found here: http://stackoverflow.com/a/39284283/5088194 – leerssej Dec 18 '16 at 03:21
3

Yes, as `summarise_each` has been deprecated you may now want to use `summarise_all` or something similar for the OP's application. – DirtStats Oct 20 '17 at 20:12
1

`summarise_each` has been deprecated. `df %>% group_by(sex) %>% summarise(across(everything(), mean))` – Jason Mathews Jul 23 '21 at 19:37

mnel · Accepted Answer · 2014-01-23T05:36:35.280

44

The data.table idiom is lapply(.SD, mean), which is

DT <- data.table(df)
DT[, lapply(.SD, mean), by = sex]
#     sex age bmi  chol
# 1:  boy  55  24 203.5
# 2: girl  51  28 197.0

I'm not sure of a dplyr idiom for the same thing, but you can do something like

dg <- group_by(df, sex)
# the names of the columns you want to summarize
cols <- names(dg)[-1]
# the dots component of your call to summarise
dots <- sapply(cols ,function(x) substitute(mean(x), list(x=as.name(x))))
do.call(summarise, c(list(.data=dg), dots))
# Source: local data frame [2 x 4]

#    sex age bmi  chol
# 1  boy  55  24 203.5
# 2 girl  51  28 197.0

Note that there is a github issue #178 to efficienctly implement the plyr idiom colwise in dplyr.

edited Jan 23 '14 at 05:36

answered Jan 22 '14 at 23:34

mnel

113,303
27
265
254

3

I'd say that's currently the best you can do with dplyr. The only change I'd make it to replace `sapply()` with `lapply()` since there's no simplification happening. – hadley Jan 23 '14 at 13:01
1

Note there is now summarize_each() and mutate_each() in dplyr: http://finzi.psych.upenn.edu/library/dplyr/html/summarise_each.html – Bar Jun 13 '16 at 22:56

Can dplyr summarise over several variables without listing each one?

2 Answers2

Linked

Related