Aggregate multiple columns according to some others

Question

Having data in a data.frame, I would like to aggregate some columns (using any general function) grouping by some others, keeping the remaining ones as they are (or even omitting them). The fashion is to recall the group by function in SQL. As an example let us assume we have

df <- data.frame(a=rnorm(4), b=rnorm(4), c=c("A", "B", "C", "A"))

and I want to sum (say) the values in column a and average (say) the values in column b, grouping by the symbols in column c. I am aware it is possible to achieve such using apply, cbind or similars, specifying the functions you want to use, but I was wondering if there were a smarter (one line) way (especially using the aggregate function) to do so.

this has been asked a gazillion times, see here http://stackoverflow.com/questions/18799901/data-frame-group-by-column and here http://stackoverflow.com/questions/6289538/aggregate-a-dataframe-on-a-given-column-and-display-another-column and here http://stats.stackexchange.com/questions/8225/how-to-summarize-data-by-group-in-r — grrgrrbla, Sep 16 '15 at 09:53
@grrgrrbla Please vote to close as a duplicate in this case. — Tim Biegeleisen, Sep 16 '15 at 09:55
possible duplicate of [Using aggregate for multiple aggregations](http://stackoverflow.com/questions/12064202/using-aggregate-for-multiple-aggregations) — Colonel Beauvel, Sep 16 '15 at 09:55
I have seen all the above and they all refer to aggregating only one column, which can be easily done with different syntax. — gented, Sep 16 '15 at 10:09
10 seconds of search work ("aggregate by multiple columns"): http://stackoverflow.com/questions/9723208/aggregate-multiple-variables-simultaneously ; — grrgrrbla, Sep 16 '15 at 10:31
If you read the above, it only refers to one kind of function (sum) for two columns, or to the use of ``cbind``, that I want to avoid. — gented, Sep 16 '15 at 10:35
It's not an elegant solution, but it works: `merge(aggregate( a ~ c,df,mean), aggregate( b ~ c, df, sum))`. I'm pretty sure that there must be better ways to obtain this result though. — RHertel, Sep 16 '15 at 10:37
`library(data.table); setDT(df)[ , .(sum(A), mean(B)), by = C]` or using dplyr: `library(dplyr); group_by(df, C) %>% summarise(sum(A), mean(B))` — grrgrrbla, Sep 16 '15 at 11:56

score 2 · Answer 1 · answered Sep 16 '15 at 12:10

2

Sorry but I don't follow how dealing with more than one column complicates things.

library(data.table)
dt <- data.table(df)
dt[,.(sum_a = sum(a),mean_b= mean(b)),by = c]

answered Sep 16 '15 at 12:10

HywelMJ

332
2
7

What if I want to group by more than one column `c`? What would the right syntax to use be? – gented Sep 17 '15 at 15:41
df <- data.frame(a=rnorm(6), b=rnorm(6), c=c("A", "B", "C", "A","D","A"), d = c(rep(letters[1:3],2))); dt <- data.table(df); dt[,.(sum_a = sum(a),mean_b= mean(b)),by = .(c,d)] – HywelMJ Sep 17 '15 at 16:07

Federico Rotolo · Answer 2 · 2015-09-16T14:42:55.223

1

like this?

    mapply(Vectorize(function(x, y) aggregate(
      df[, x], by=list(df[, 3]), FUN=y), SIMPLIFY = F), 
      1:2, c('sum', 'mean'))

edited Sep 16 '15 at 14:42

answered Sep 16 '15 at 12:23

Federico Rotolo

47
3

This gives me: [,1] [,2] Group.1 factor,3 factor,3 x Numeric,3 Numeric,3 – HywelMJ Sep 16 '15 at 13:34

Aggregate multiple columns according to some others

2 Answers2