2

i have a matrix data frame 6940 rows and 100 columns. I need to find 5 days cumulative at a time on the data set. Right now I was able to build a for loop code for this as follows :

cum<- matrix(data=q1,nrow=6940,ncol=100)
for (j in 1:100){
  for (i in 1:6940){
    cum[i,j]<-sum(q1[i,j],q1[i+1,j],q1[i+2,j],q1[i+3,j],q1[i+4,j],na.rm=T)
  }
}

I wanted to know whether there is any function in apply family to do the same, as this code is very time consuming.

for example if i generate a data frame using the command

 ens <- matrix(rnorm(200),20)

I want cumulative sum of 5 rows a time. i.e sum of row1:row5, row2:row6, row3:row7 and so on in a form of data frame.

i tried using apply function in this form :

apply(apply(apply(apply( apply(m, 2, cumsum),2, cumsum), 2, cumsum),2,cumsum),2,cumsum)

but the problem is I don't get the cumulative in blocks of 5, only an overall cumulative.

lawyeR
  • 7,488
  • 5
  • 33
  • 63
saurabh
  • 391
  • 1
  • 2
  • 11
  • Please provide a [minimal reproducible example](http://stackoverflow.com/a/5963610/1412059). – Roland Oct 23 '15 at 07:54

3 Answers3

5

Here is one approach using the stats::filter function to calculate the rolling sums and apply to loop over the columns:

m <- matrix(1:48, ncol = 4)
#      [,1] [,2] [,3] [,4]
# [1,]    1   13   25   37
# [2,]    2   14   26   38
# [3,]    3   15   27   39
# [4,]    4   16   28   40
# [5,]    5   17   29   41
# [6,]    6   18   30   42
# [7,]    7   19   31   43
# [8,]    8   20   32   44
# [9,]    9   21   33   45
#[10,]   10   22   34   46
#[11,]   11   23   35   47
#[12,]   12   24   36   48

apply(m, 2, filter, filter = rep(1, 5), sides = 1)
#      [,1] [,2] [,3] [,4]
# [1,]   NA   NA   NA   NA
# [2,]   NA   NA   NA   NA
# [3,]   NA   NA   NA   NA
# [4,]   NA   NA   NA   NA
# [5,]   15   75  135  195
# [6,]   20   80  140  200
# [7,]   25   85  145  205
# [8,]   30   90  150  210
# [9,]   35   95  155  215
#[10,]   40  100  160  220
#[11,]   45  105  165  225
#[12,]   50  110  170  230

This might have to be adjusted depending on how you want to handle windows with less than 5 values (e.g., here in the beginning).

Roland
  • 127,288
  • 10
  • 191
  • 288
  • this approach really looks cool, i will verify with my data once and mark it as an answer. really really cool stuff man. tnx :) – saurabh Oct 23 '15 at 08:04
  • can you please also explain the code can be really helpful. especially the use of filter inside apply. tnx – saurabh Oct 23 '15 at 08:13
  • Read `help("apply")`. It takes a matrix-like object, the margin (here the columns), a function (here `filter`) and further arguments to the function (see `help("filter")` for these). – Roland Oct 23 '15 at 08:15
  • there is an error with the approach when i tried it on my machine: > m <- matrix(1:48, ncol = 4) > apply(m, 2, filter, filter = rep(1, 5), sides = 1) Error in UseMethod("filter_") : no applicable method for 'filter_' applied to an object of class "c('integer', 'numeric')" – saurabh Oct 23 '15 at 08:23
  • sorry just got it, we require to load package mass before loading package stats :) – saurabh Oct 23 '15 at 08:26
  • @saurabh `stats` is a base package. It is loaded when R is launched. –  Oct 23 '15 at 08:29
  • stats should be loaded by default. (Note that dplyr masks `filter` in which case you have to specify `stats::filter` explicitly.) – Roland Oct 23 '15 at 08:29
  • yeah i guess R was using dplyr filter in my case .. (Y) – saurabh Oct 23 '15 at 08:33
5

Another option is roll_sum (Data from @Roland's post)

library(RcppRoll)
apply(m, 2, roll_sumr, 5)
#       [,1] [,2] [,3] [,4]
# [1,]   NA   NA   NA   NA
# [2,]   NA   NA   NA   NA
# [3,]   NA   NA   NA   NA
# [4,]   NA   NA   NA   NA
# [5,]   15   75  135  195
# [6,]   20   80  140  200
# [7,]   25   85  145  205
# [8,]   30   90  150  210
# [9,]   35   95  155  215
#[10,]   40  100  160  220
#[11,]   45  105  165  225
#[12,]   50  110  170  230

As @alexis_laz mentioned in the comments, roll_sumr can take matrix as well. It is more efficient.

roll_sumr(m, 5, by = 1)

Benchmarks

set.seed(24)
m1 <- matrix(sample(1:50, 5000*5000, replace=TRUE), ncol=5000)
system.time(apply(m1, 2, roll_sumr, 5))
# user  system elapsed 
# 1.84    0.16    1.99 

system.time(roll_sumr(m1, 5, by = 1))
#  user  system elapsed 
#  0.59    0.15    0.74 

system.time(apply(m1, 2, stats::filter, filter = rep(1, 5), sides = 1))
#  user  system elapsed 
#  4.46    0.20    4.68 
Community
  • 1
  • 1
akrun
  • 874,273
  • 37
  • 540
  • 662
0

Another approach, less sophisticated: Created 5 variable and sum by the variable 5 time. Here:

m <- data.table(matrix(1:48, ncol = 4))
m[, index := .I]

m[, i1 := floor((index - 1) / 5) * 5 + 1]
m[, i2 := floor((index - 2) / 5) * 5 + 2]
m[, i3 := floor((index - 3) / 5) * 5 + 3]
m[, i4 := floor((index - 4) / 5) * 5 + 4]
m[, i5 := floor((index - 5) / 5) * 5 + 5]

cumsumm <- rbindlist(list(m[, list(value = sum(V1)), by = "i1"]
          , m[, list(value = sum(V1)), by = "i2"]
          , m[, list(value = sum(V1)), by = "i3"]
          , m[, list(value = sum(V1)), by = "i4"]
          , m[, list(value = sum(V1)), by = "i5"]), use.names=F)[i1 > 0, ]
YCR
  • 3,794
  • 3
  • 25
  • 29