This answer suggested to use reduce()
and shift()
for rolling window problems with data.table
. This benchmark showed that this might be considerably faster than zoo::rollapply()
.
test[, momentum := Reduce(`*`, shift(return + 1.0, 0:2, type="lag")) - 1, by = sec][]
# return sec momentum
# 1: 0.1 A NA
# 2: 0.1 A NA
# 3: 0.1 A 0.331
# 4: 0.1 A 0.331
# 5: 0.1 A 0.331
# 6: 0.2 B NA
# 7: 0.2 B NA
# 8: 0.2 B 0.728
# 9: 0.2 B 0.728
#10: 0.2 B 0.728
Benchmark (10 rows, OP data set)
microbenchmark::microbenchmark(
zoo = test[, momentum := zoo_fun(return, 3), by = sec][],
red = test[, momentum := Reduce(`*`, shift(return + 1.0, 0:2, type="lag")) - 1, by = sec][],
times = 100L
)
#Unit: microseconds
# expr min lq mean median uq max neval cld
# zoo 2318.209 2389.131 2445.1707 2421.541 2466.1930 3108.382 100 b
# red 562.465 625.413 663.4893 646.880 673.4715 1094.771 100 a
Benchmark (100k rows)
To verify the benchmark results with the small data set, a larger data set is constructed:
n_rows <- 1e4
test0 <- data.table(return = rep(as.vector(outer(1:5/100, 1:2/10, "+")), n_rows),
sec = rep(rep(c("A", "B"), each = 5L), n_rows))
test0
# return sec
# 1: 0.11 A
# 2: 0.12 A
# 3: 0.13 A
# 4: 0.14 A
# 5: 0.15 A
# ---
# 99996: 0.21 B
# 99997: 0.22 B
# 99998: 0.23 B
# 99999: 0.24 B
#100000: 0.25 B
As test
is being modified in place, each benchmark run is started with a fresh copy of test0
.
microbenchmark::microbenchmark(
copy = test <- copy(test0),
zoo = {
test <- copy(test0)
test[, momentum := zoo_fun(return, 3), by = sec][]
},
red = {
test <- copy(test0)
test[, momentum := Reduce(`*`, shift(return + 1.0, 0:2, type="lag")) - 1, by = sec][]
},
times = 10L
)
#Unit: microseconds
# expr min lq mean median uq max neval cld
# copy 282.619 294.512 325.3261 298.424 350.272 414.983 10 a
# zoo 1129601.974 1144346.463 1188484.0653 1162598.499 1194430.395 1337727.279 10 b
# red 3354.554 3439.095 6135.8794 5002.008 7695.948 11443.595 10 a
For 100k rows, the Reduce()
/ shift()
approach is more than 200 times faster than the zoo::rollapply()
.
Apparently, there are different interpretations of what the expected result is.
To investigate this, a modified data set is used:
test <- data.table(return=c(0.1, 0.11, 0.12, 0.13, 0.14, 0.21, 0.22, 0.23, 0.24, 0.25),
sec=c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B"))
test
# return sec
# 1: 0.10 A
# 2: 0.11 A
# 3: 0.12 A
# 4: 0.13 A
# 5: 0.14 A
# 6: 0.21 B
# 7: 0.22 B
# 8: 0.23 B
# 9: 0.24 B
#10: 0.25 B
Note that the return
values within in each group are varying which is different to the OP's data set where the return
values for each sec
group are constant.
With this, the accepted answer (rollapply()
) returns
test[, momentum := zoo_fun(return, 3), by = sec][]
# return sec momentum
# 1: 0.10 A NA
# 2: 0.11 A NA
# 3: 0.12 A 0.367520
# 4: 0.13 A 0.404816
# 5: 0.14 A 0.442784
# 6: 0.21 B NA
# 7: 0.22 B NA
# 8: 0.23 B 0.815726
# 9: 0.24 B 0.860744
#10: 0.25 B 0.906500
Henrik's answer returns:
test[test[ , tail(.I, 3), by = sec]$V1, res := prod(return + 1) - 1, by = sec][]
# return sec res
# 1: 0.10 A NA
# 2: 0.11 A NA
# 3: 0.12 A 0.442784
# 4: 0.13 A 0.442784
# 5: 0.14 A 0.442784
# 6: 0.21 B NA
# 7: 0.22 B NA
# 8: 0.23 B 0.906500
# 9: 0.24 B 0.906500
#10: 0.25 B 0.906500
The Reduce()
/shift()
solution returns:
test[, momentum := Reduce(`*`, shift(return + 1.0, 0:2, type="lag")) - 1, by = sec][]
# return sec momentum
# 1: 0.10 A NA
# 2: 0.11 A NA
# 3: 0.12 A 0.367520
# 4: 0.13 A 0.404816
# 5: 0.14 A 0.442784
# 6: 0.21 B NA
# 7: 0.22 B NA
# 8: 0.23 B 0.815726
# 9: 0.24 B 0.860744
#10: 0.25 B 0.906500