I have a data frame with many columns, some of which are measure variables. I would like to extract a bunch of summary statistics from the latter using data.table. My problem is the following: how to rename the aggregated columns according to the function that was used?
I want to have an aggregated data.table with column names like: c("measure1_mean", "measure1_sd", "measure2_mean", "measure2_sd", ...)
My code looks like this:
library(data.table)
library(stringr)
dt <- data.table(meas1=1:10,
meas2=seq(5,25, length.out = 10),
meas3=rnorm(10),
groupvar=rep(LETTERS[1:5], each=2))
measure_cols <- colnames(dt)[str_detect(colnames(dt), "^meas")]
dt_agg <- dt[, c(lapply(.SD, mean),
lapply(.SD, sd)),
by=groupvar, .SDcols = measure_cols]
# Does not work because of duplicates in rep(measure_cols, 3)
agg_names <- c(measure_cols, paste(rep(c("mean", "sd"), each=length(measure_cols)), measure_cols, sep="_"))
setnames(dt_agg, rep(measure_cols,3), agg_names)
This chunk effectively extracts the statistics but returns columns with identical names. Therefore I cannot use something like setnames(dt, old, new)
because duplicates exist in my 'old' vector.
I came across this post: Rename aggregated columns using data.table in R. But I do not like the accepted solution because it relies on column index, and not names, to rename the columns.