1

I have a data.table and would like to run multiple aggregations on multiple columns while the table is grouped on another variable. I have tried the following:

library(data.table)

DT <- data.table(a = 1:10,
                 b = 10:1,
                 group = rep(1:2, each=5))

aggs <- function(x) list(mean = mean(x), sd = sd(x))

DT[, lapply(.SD, aggs), .(group), .SDcols = c('a', 'b')]

This doesn't quite work as I would either need the names() as a column or for the output to be split into columns - say a.mean, b.mean, etc.:

   group        a        b
1:     1        3        8
2:     1 1.581139 1.581139
3:     2        8        3
4:     2 1.581139 1.581139
Jealie
  • 6,157
  • 2
  • 33
  • 36
RoyalTS
  • 9,545
  • 12
  • 60
  • 101
  • With regards to [this answer](https://stackoverflow.com/questions/42163675/data-table-lapply-a-function-with-multicolumn-output#42164417) - maybe `melt(DT, id="group")[,Reduce(c, lapply(.SD, aggs)),"group,variable"]` or `dcast(melt(DT, id="group")[,Reduce(c, lapply(.SD, aggs)),"group,variable"],group~variable, value.var=c("mean","sd"))` ? – lukeA Jul 07 '17 at 21:36
  • 1
    @lukeA I guess there's no need to melt. Something like `dcast(DT, group ~ ., fun = list(mean, sd), value.var = c("a","b"))` albeit not with the OP's preferred col names. – Frank Jul 08 '17 at 18:32
  • Converting to wide format is usually a bad idea as it will literally impossible to use the resulting data set for further analysis. – David Arenburg Jul 09 '17 at 09:18

1 Answers1

1

You were close, missing an extra data.frame to cast the results the way you want:

DT[, data.frame(lapply(.SD, aggs)), by=group, .SDcols = c('a', 'b')]

giving:

   group a.mean     a.sd b.mean     b.sd
1:     1      3 1.581139      8 1.581139
2:     2      8 1.581139      3 1.581139
Jealie
  • 6,157
  • 2
  • 33
  • 36