These three commands return the same result (regression on a subset of observations). I'd like to know whether there are important differences in term of what data.table really does in the background.
suppressMessages(library("data.table"))
suppressMessages(library("biglm"))
N=1e7; K=100
set.seed(1)
DT <- data.table(
id = 1:N,
v1 = sample(5, N, TRUE), # int in range [1,5]
v2 = sample(1e6, N, TRUE), # int in range [1,1e6]
v3 = sample(round(runif(100,max=100),4), N, TRUE) # numeric e.g. 23.5749
)
DT[, condition := id>100]
# fist command
coefficients(biglm(v3 ~ v2 + v1, DT[id>100, c("v1", "v2", "v3"), with = FALSE]))
# second command
DT[ id >100, coefficients(biglm(v3 ~ v2 + v1, .SD)), .SDcols = c("v1", "v2", "v3")]
# third command
DT[, coefficients(biglm(v3 ~ v2 + v1, .SD)), by = condition, .SDcols = c("v1", "v2", "v3")]
If I run each command in a new session of R, it seems that the time taken by each command is the same. In a more general situation, are all these commands equivalent from a memory / speed point of view? Thanks!