I have 34 rasters (nrow: 17735, ncol: 11328, ncell: 200902080) with values 0 and 1, of 4Mb each. I want the cumulative sum of those values with zero reset.
I tried several alternatives based on: Cumulative sum that resets when 0 is encountered
library(microbenchmark)
library(compiler)
library(dplyr)
library(data.table)
x <- c(0,0,0,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0)
fun = function(x)
{ cs = cumsum(x)
cs - cummax((x == 0) * cs)
}
funC <- cmpfun(fun)
microbenchmark(
funcioEx = fun(x),
funComEx = funC(x),
lapplyEx = unname(unlist(lapply(split(x,cumsum(c(0,diff(x) != 0))), cumsum))),
dataTaEx = data.table(x)[, whatiwant := cumsum(x), by = rleid(x==0L)],
reduceEx = Reduce(function(x, y) if (y == 0) 0 else x+y, x, accumulate=TRUE)
)
I would like to optimize this procedure for my data, because with the second option (funComEx, the fastest) it takes about 3 hours.