1

This question is an extention of this question: Apply multiple functions to multiple columns in data.table. Given a data.table

DT <- data.table("a"=1:5,
                 "b"=2:6,
                 "c"=c(1, 1, 2, 2, 2))

I want to apply a list of functions to a and b grouping by c. If I don't group by c I get the expected result:

my.summary = function(x) list(mean = mean(x), median = median(x))
DT[, unlist(lapply(.SD, my.summary)), .SDcols = c("a", "b")]
# a.mean a.median   b.mean b.median 
#       3        3        4        4 

When doing the same operation, but grouping by c, I expected to get

 c a.mean a.median   b.mean b.median 
 1   1.5      1.5      2.5      2.5 
 2    4        4        5        5 

but instead I got

DT[, unlist(lapply(.SD, my.summary)), by = c, .SDcols = c("a", "b")]
   c  V1
1: 1 1.5
2: 1 1.5
3: 1 2.5
4: 1 2.5
5: 2 4.0
6: 2 4.0
7: 2 5.0
8: 2 5.0

It seems like the data has been melt, without a way to know which function has been applied (unless you know the order in my.summary. Any suggestions on how to solve this?

Henrik
  • 65,555
  • 14
  • 143
  • 159
J.C.Wahl
  • 1,394
  • 8
  • 15
  • You may wrap your `j` in `as.list`. See the "_For the more general case"_ here: [Calculate multiple aggregations with lapply(.SD, …)](https://stackoverflow.com/a/24151832/1851712) – Henrik Jul 31 '20 at 10:44
  • Thank you! Should I delete the question since there already exist a similar question? – J.C.Wahl Jul 31 '20 at 11:22
  • No, just keep it here as a signpost for future visitors. And thanks for posting a small example and sharing your research and code attempts. Cheers. – Henrik Jul 31 '20 at 11:34

1 Answers1

3

First you need to change your function. data.table expects consistent types and median can return integer or double values depending on input.

my.summary <- function(x) list(mean = mean(x), median = as.numeric(median(x)))

Then you need to ensure that only the first level of the nested list is unlisted. The result of the unlist call still needs to be a list (remember, a data.table is a list of column vectors).

DT[, unlist(lapply(.SD, my.summary), recursive = FALSE), by = c, .SDcols = c("a", "b")]
#   c a.mean a.median b.mean b.median
#1: 1    1.5      1.5    2.5      2.5
#2: 2    4.0      4.0    5.0      5.0
Roland
  • 127,288
  • 10
  • 191
  • 288