I'm trying to improve the efficiency of the following simple data.table
syntax, so I'm trying to combine it into one call without repeatedly calling by = "group"
.
#data
library(data.table)
DT <- data.table(group = c(rep("a", 40), rep("b", 40)),
other = rnorm(80),
num = c(1:80))
#reduce this to one "by" call
DT[, c1 := ifelse(num <= 7, NA, num), by = "group"]
DT[, sprintf("c%d", 2:10) := shift(c1, 1:9, type = 'lag'), by = "group"]
DT[, d1 := shift(c10, 1, type = 'lag'), by = "group"]
DT[, sprintf("d%d", 2:10) := shift(d1, 1:9, type = 'lag'), by = "group"]
DT[, e1 := shift(d10, 1, type = 'lag'), by = "group"]
DT[, sprintf("e%d", 2:10) := shift(e1, 1:9, type = 'lag'), by = "group"]
Something like
DT[, .(c1 := ifelse(num <= 7, NA, num),
sprintf("c%d", 2:10) := shift(c1, 1:9, type = 'lag'),
d1 := shift(c10, 1, type = 'lag'),
sprintf("d%d", 2:10) := shift(d1, 1:9, type = 'lag'),
e1 := shift(d10, 1, type = 'lag'),
sprintf("e%d", 2:10) := shift(e1, 1:9, type = 'lag')), by = "group"]
Edit:
This is similar but slightly different to this question as the variables created here are not independent of one another.
Any suggestions?
Thanks