21

I'd like to do the equivalent of the following, but with data.table's "by":

dt <- data.table(V1=rnorm(100), V2=rnorm(100), V3=rnorm(100), ...
                 group=rbinom(100,2,.5))
dt.agg <- aggregate(dt, by=list(dt$group), FUN=mean)

I know that I could do this:

dt.agg <- dt[, list(V1=mean(V1), V2=mean(V2), V3=mean(V3)), by=group]

But for the case I'm considering I have 100 or so columns V1-V100 (and I always want to aggregate all of them by a single factor, as in aggregate above) so the data.table solution I've got above isn't feasible.

smci
  • 32,567
  • 20
  • 113
  • 146
stackoverflax
  • 1,077
  • 3
  • 11
  • 25
  • 9
    For reference, among the copious examples in `?data.table` is the following: `DT[,lapply(.SD,sum),by=x]`. – joran Aug 06 '13 at 21:53
  • @joran, could you please explain the role of .SD? – POTENZA Jul 23 '15 at 00:32
  • `.SD` refers to subset of data. – Ic3fr0g Sep 01 '17 at 13:29
  • 1
    `.SD.` means "all columns, other than the one(s) used in the grouping by-expression". Please read [What does .SD stand for in data.table in R](https://stackoverflow.com/questions/8508482/what-does-sd-stand-for-in-data-table-in-r). In this case, `.SD` is equivalent to `.(V1,V2,V3)` or `list(V1,V2,V3)`, but it's faster than explicitly referencing them by name. – smci Mar 24 '19 at 03:29

1 Answers1

45
dt[, lapply(.SD, mean), by=group]

To specifiy columns:

dt[,...,by=group, .SDcols=c("V1", "V2", "V3", ...)]
dt[,...,by=group, .SDcols=names(dt)[1:100]]
eddi
  • 49,088
  • 6
  • 104
  • 155
Señor O
  • 17,049
  • 2
  • 45
  • 47