I want to aggregate datatable's row, but the aggragation function depends on the name of the column.
For example, if column name is:
variable1
orvariable2
, then apply themean()
function.variable3
, then apply themax()
function.variable4
, then apply thesd()
function.
My datatables always have a datetime
column: I want to aggregate rows by time.
However, the number of "data" column can vary.
I know how to do that with the same aggregation function (e.g. mean()
) for all columns:
dt <- dt[, lapply(.SD, mean),
by = .(datetime = floor_date(datetime, timeStep))]
Or for only a subset of columns:
cols <- c("variable1", "variable2")
dt <- dt[ ,(cols) := lapply(.SD, mean),
by = .(datetime = floor_date(datetime, timeStep)),
.SDcols = cols]
What I would like to do is something like:
colsToMean <- c("variable1", "variable2")
colsToMax <- c("variable3")
colsToSd <- c("variable4")
dt <- dt[ ,{(colsToMean) := lapply(.SD???, mean),
(colsToMax) := lapply(.SD???, max),
(colsToSd) := lapply(.SD???, sd)},
by = .(datetime = floor_date(datetime, timeStep)),
.SDcols = (colsToMean, colsToMax, colsToSd)]
I looked at data.table in R - apply multiple functions to multiple columns which gave me the idea to use a custom function:
myAggregate <- function(x, columnName) {
FUN = getAggregateFunction(columnName) # Return mean() or max() or sd()
return FUN(x)
}
dt <- dt[, lapply(.SD, myAggregate, ???columName???),
by = .(datetime = floor_date(datetime, timeStep))]
But I don't know how to pass the current column name to myAggregate()
...