13
dt <- data.table(x=c(1, .9, .8, .75, .5, .1))
dt
      x
1: 1.00
2: 0.90
3: 0.80
4: 0.75
5: 0.50
6: 0.10

For each row, how do I get the product of x for that row and the next two rows?

      x Prod.3
1: 1.00 0.7200
2: 0.90 0.5400
3: 0.80 0.3000
4: 0.75 0.0375
5: 0.50     NA
6: 0.10     NA

More generally, for each row, how do I get the product of x for that row and the next n rows?

Ben
  • 20,038
  • 30
  • 112
  • 189

4 Answers4

16

Here's another possible version using data.table::shift combined with Reduce (as per @Aruns comment)

library(data.table) #v1.9.6+
N <- 3L
dt[, Prod3 := Reduce(`*`, shift(x, 0L:(N - 1L), type = "lead"))]

shift is vectorized, meaning it can create several new columns at once depending on the vector passed to the n argument. Then, Reduce is basically applies * to all the vectors at once element-wise.

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • 7
    this is cool.. you could do: `Reduce(\`*\`, shift(dt, 0:2, type="lead"))` – Arun Jun 02 '15 at 19:42
  • 1
    Incredible speed up! is it possible to make it work on variable `n`? Then I could apply it on that: http://stackoverflow.com/q/21368245/2490497 – jangorecki Jun 02 '15 at 21:42
  • @jan Im not in front of a computer right now so will take a look tomorrow. Also not sure what you mean by "variable n". – David Arenburg Jun 02 '15 at 21:57
  • @DavidArenburg making the rolling window vector instead of scalar gives a way to code adaptive moving averages. – jangorecki Jun 02 '15 at 22:33
15

Here are two ways.. albeit not the most efficient implementations possible:

require(data.table)
N = 3L
dt[, prod := prod(dt$x[.I:(.I+N-1L)]), by=1:nrow(dt)]

Another one using embed():

tmp = apply(embed(dt$x, N), 1, prod)
dt[seq_along(tmp), prod := tmp]

Benchmarks:

set.seed(1L)
dt = data.table(x=runif(1e6))
zoo_fun <- function(dt, N) {
    rollapply(dt$x, N, FUN=prod, fill=NA, align='left')
}

dt1_fun <- function(dt, N) {
    dt[, prod := prod(dt$x[.I:(.I+N-1L)]), by=1:nrow(dt)]
    dt$prod
}

dt2_fun <- function(dt, N) {
    tmp = apply(embed(dt$x, N), 1L, prod)
    tmp[1:nrow(dt)]
}

david_fun <- function(dt, N) {
    Reduce(`*`, shift(dt$x, 0:(N-1L), type="lead"))
}

system.time(ans1 <- zoo_fun(dt, 3L))
#    user  system elapsed 
#   8.879   0.264   9.221 
system.time(ans2 <- dt1_fun(dt, 3L))
#    user  system elapsed 
#  10.660   0.133  10.959
system.time(ans3 <- dt2_fun(dt, 3L))
#    user  system elapsed 
#   1.725   0.058   1.819 
system.time(ans4 <- david_fun(dt, 3L))
#    user  system elapsed 
#   0.009   0.002   0.011 

all.equal(ans1, ans2) # [1] TRUE
all.equal(ans1, ans3) # [1] TRUE
all.equal(ans1, ans4) # [1] TRUE
Arun
  • 116,683
  • 26
  • 284
  • 387
  • in this code, only one column is being used. How does one pass on a number of columns the SDcols way? dt[, prod := prod(dt$x[.I:(.I+N-1L)]), by=1:nrow(dt)]. – ashleych Oct 02 '17 at 12:42
  • 1
    @Arun It's really amazing how much can be achieved within data.table `[.]` by manipulating `.N`, `.I` and `.SD`. Wouldn't need to remember a lot many other functions! – Nikhil Vidhani Jun 28 '18 at 16:35
10

you can try

library(zoo)
rollapply(dt, 3, FUN = prod)
          x
[1,] 0.7200
[2,] 0.5400
[3,] 0.3000
[4,] 0.0375

To match the expected output

dt[, Prod.3 :=rollapply(x, 3, FUN=prod, fill=NA, align='left')]
Alex A.
  • 5,466
  • 4
  • 26
  • 56
Mamoun Benghezal
  • 5,264
  • 7
  • 28
  • 33
  • 1
    This is slow (as shown by Arun), but it's simple and quick enough for my use case. – Ben Jun 02 '15 at 20:57
1

Now data.table has fast rolling functions. So @Mamoun Benghezal 's approach can be used as

dt[, Prod.3 := frollapply(x, 3, FUN=prod, fill=NA, align='left')]

This is very fast, though not as fast as @David Arenburg 's function. Using @Arun 's benchmark:

set.seed(1L)
dt = data.table(x=runif(1e6))

froll_fun <- function(dt, N) {
    frollapply(dt$x, N, FUN = prod, fill = NA, align = 'left')
}

system.time(ans5 <- froll_fun(dt, 3L))
#  user  system elapsed 
# 0.191   0.000   0.191 
James Hirschorn
  • 7,032
  • 5
  • 45
  • 53