In data.table, how can I do the following:
- Group table by several columns combined
- Then hand over each group to a custom aggregation function, which:
- Takes all columns from the group table subset and aggregates them by returning several new columns which will be added to the table
The trick here is to generate several new columns without calling the aggregation function more than once.
Example:
library(data.table)
mtcars_dt <- data.table(mtcars)
returnsOneColumn <- function(dt_group_all_columns){
"returned_value_1"
}
# works great, returns one new column as summary per group
mtcars_dt[,
list( new_column_1 = returnsOneColumn(dt_group_all_columns= .SD) ),
by = c("mpg", "cyl"),
.SDcols = colnames(mtcars_dt)
]
returnsMultipleColumns <- function (dt_group_all_columns){
list( "new_column_1" = "returned_value_1",
"new_column_2" = "returned_value_2" )
}
# does not work: Ideally, I would like to have mpg, cyl, and several columns
# generated from once calling returnsMultipleColumns
mtcars_dt[,
list( returnsMultipleColumns(dt_group_all_columns = .SD) ),
by = c("mpg", "cyl"),
.SDcols = colnames(mtcars_dt)
]
# desired output should look like this
#
# mpg cyl new_column_1 new_column_2
# 1: 21.0 6 returned_value_1 returned_value_2
# 2: 22.8 4 returned_value_1 returned_value_2
# 3: 21.4 6 returned_value_1 returned_value_2
# 4: 18.7 8 returned_value_1 returned_value_2
Related: