Can I use data.table's inherent speed to get a faster row-by-row t.test result, with variable column names? Below is my current code, and it takes a few seconds per every 1000 rows.
slow.diffexp <- function(dt, samples1, samples2) {
for (i in 1:nrow(dt)) {
if (round(i/1000)==i/1000) {
cat(i, "\n");
}
a <- t.test(dt[i, samples1, with=FALSE],
dt[i, samples2, with=FALSE]);
set(dt, i, "tt.p.value", a$p.value)
set(dt, i, "tt.mean1", a$estimate[1])
set(dt, i, "tt.mean2", a$estimate[2])
}
}
test.dt <- data.table(V1=sample(1000, 100000, replace=TRUE));
for (i in 2:20) {
colname <- paste0("V", i);
test.dt[ , (colname):=sample(1000, 100000, replace=TRUE)];
}
samples1 <- sample(names(test.dt), size=10);
samples2 <- setdiff(names(test.dt), samples1);
slow.diffexp(test.dt, samples1, samples2);
I have looked at the following related posts:
- Paired t-test for each row of a data table: has a solution but can we get faster?
- Doing t.test for columns for each row in data set: does not use data.table; also slow
I'm using set() because I have this idea that set is faster than <- for data.frames...