I've read several related threads on how to apply many different functions to many different columns in data.tables. This one about columns and groups and a similar one here. They were both very helpful, but I am looking for a more elegant solution to something quite similar.
From the two above links, the following code:
library(data.table)
DT <- data.table(x= rnorm(50), y = rnorm(50), treatment = c(0,1))
vars <- c("x", "y")
my.summary = function(x) c(Mean = mean(x, na.rm = T), Min = min(x, na.rm = T), Q1 = quantile(x, 0.25, na.rm =T),
Median = median(x, na.rm = T), Q3 = quantile(x, 0.75, na.rm=T), Max = max(x, na.rm = T))
summ_stats <- DT[, as.list(unlist(lapply(.SD, my.summary))), .SDcols = vars, by = .(treatment)]
produces summary statistics for the variables in var
by treatment
status. The output looks something like:
x.Mean, y Min, y Q1.25%, ..., x.Max, y.Mean, y.Min, ...., y.Max
x
y
I am looking for something similar, but what I want exactly (using data.table's speed) is something that looks like:
variable treatment Max Min Q1 Median Q3, Max, p.value
x 0
1
y 0
1
Any suggestions would be greatly appreciated!