-1

Let I have some data.table (dt)

    pga  fgm  fga  tgp  mode
1:  0.2  0.1  0.9  7.3     1
2:  1.3  7.5  8.3  8.3     3
3:  2.0  7.7  6.3  7.7     2
4:  7.3  3.6  7.0  6.6     1
5:  6.7  0.3  8.3  0.6     2
6:  5.0  3.7 -1.1 -3.2     1
....

I need to calculate mean for each variable grouped by variable mode and get data.table as following:

    mode   pga   fgm   fga   tgp
1:     1  0.23  0.11  10.9  7.23
2:     2  1.32  73.5  85.3  8.33
3:     3  2.06  7.75  6.33  7.47
4:     4  6.32  32.6  7.01  6.16
....

There is one-liner to perform the task:

dt[,list(pga=mean(pga), fgm=mean(fgm), fga=mean(fga), tgp=mean(tgp)), by=mode] 

It's ok if there would 4 variables only. However, in the real world number of variables is ~1000. How to modify the script for actual task?

Loom
  • 9,768
  • 22
  • 60
  • 112
  • 2
    `dt[, lapply(.SD, mean), by = mode]` is the standard way to run a function across all data table columns. Although I don't know why you have `sum()` for the first column. This is definitely a duplicate. – Rich Scriven Apr 13 '16 at 01:36
  • @HaddE.Nuff - Thank you. I fixed typo – Loom Apr 13 '16 at 01:40
  • 1
    Bonus trick: use `.SDcols` to summarize many, but not all columns, e.g., `some_cols <- c("pga", "fgm", "fga", "tgp"); dt[ , lapply(.SD, mean), by = mode, .SDcols = some_cols]` – MichaelChirico Apr 13 '16 at 01:50
  • @HaddE.Nuff - Do you mind to write your comment as an answer I can upvote? – Loom Apr 13 '16 at 01:52

1 Answers1

0

with dplyr

library(dplyr)
dt %>%
   group_by(mode) %>%
   summarise_each(funs(mean))
akrun
  • 874,273
  • 37
  • 540
  • 662