Summarize a data.table with many variables by variable

Question

Let I have some data.table (dt)

    pga  fgm  fga  tgp  mode
1:  0.2  0.1  0.9  7.3     1
2:  1.3  7.5  8.3  8.3     3
3:  2.0  7.7  6.3  7.7     2
4:  7.3  3.6  7.0  6.6     1
5:  6.7  0.3  8.3  0.6     2
6:  5.0  3.7 -1.1 -3.2     1
....

I need to calculate mean for each variable grouped by variable mode and get data.table as following:

    mode   pga   fgm   fga   tgp
1:     1  0.23  0.11  10.9  7.23
2:     2  1.32  73.5  85.3  8.33
3:     3  2.06  7.75  6.33  7.47
4:     4  6.32  32.6  7.01  6.16
....

There is one-liner to perform the task:

dt[,list(pga=mean(pga), fgm=mean(fgm), fga=mean(fga), tgp=mean(tgp)), by=mode]

It's ok if there would 4 variables only. However, in the real world number of variables is ~1000. How to modify the script for actual task?

`dt[, lapply(.SD, mean), by = mode]` is the standard way to run a function across all data table columns. Although I don't know why you have `sum()` for the first column. This is definitely a duplicate. — Rich Scriven, Apr 13 '16 at 01:36
Bonus trick: use `.SDcols` to summarize many, but not all columns, e.g., `some_cols <- c("pga", "fgm", "fga", "tgp"); dt[ , lapply(.SD, mean), by = mode, .SDcols = some_cols]` — MichaelChirico, Apr 13 '16 at 01:50
@HaddE.Nuff - Do you mind to write your comment as an answer I can upvote? — Loom, Apr 13 '16 at 01:52

score 0 · Answer 1 · answered Apr 13 '16 at 01:50

0

with dplyr

library(dplyr)
dt %>%
   group_by(mode) %>%
   summarise_each(funs(mean))

answered Apr 13 '16 at 01:50

akrun

874,273
37
540
662

Summarize a data.table with many variables by variable

1 Answers1