0

Using the rollsum function in zoo, I am seeing NAs in place of what I would expect to be valid values. rollapply using sum works as expected, but rollsum does not:

library(zoo)
x <- c(1,2,3,NA,NA,4,5,6)
x
[1]  1  2  3 NA NA  4  5  6
rollapply(x, 3, FUN=sum, fill=NA)
[1] NA  6 NA NA NA NA 15 NA
rollsum(x, 3, fill=NA)
[1] NA  6 NA NA NA NA NA NA

Am I missing something, or is this a bug in the optimization rollsum is using?

Hack-R
  • 22,422
  • 14
  • 75
  • 131
andrew
  • 2,524
  • 2
  • 24
  • 36
  • `?zoo::rollsum` states that `rollmean` does not handle `NA`s -- probably due to using `cumsum`. Seems to be the case for `rollsum` too. – alexis_laz Oct 11 '16 at 19:28
  • Makes sense. Odd that it produces an incorrect result instead of an error. – andrew Oct 11 '16 at 19:34
  • 1
    Maybe `RcppRoll::roll_sum` could be an alternative for you – Rentrop Oct 11 '16 at 20:08
  • @Floo0 several OOM faster than rollapply, thanks for the tip. Benchmarking zoo:rollapply vs RcppRoll::roll_sum for those interested: http://stackoverflow.com/a/21371399/736755 – andrew Oct 11 '16 at 20:59
  • `rollsum` is an optimized version of `rollapply(..., sum)` to provide speed while still being written in 100% R but in exchange it does not handle NAs. The help file says that `rollmean` does not handle `NA` values but it should have said that `rollmean` and `rollsum` do not. If you want to give it NA values use `rollapply`. – G. Grothendieck Oct 11 '16 at 20:59
  • 1
    Achim just mentioned to me that he has fixed the help file in the development version of zoo to now refer to `rollsum` and not just `rollmean`. – G. Grothendieck Oct 11 '16 at 21:15

2 Answers2

1

The default methods of rollmean and rollsum do not handle inputs that contain NAs. In such cases, use rollapply instead.

Ning
  • 514
  • 5
  • 5
0

rollsum is defined within rollmean.R as follows:

rollsum <- function(x, k, fill = if (na.pad) NA, na.pad = FALSE, 
    align = c("center", "left", "right"), ...) {
    UseMethod("rollsum")
}

where the method is:

rollsum.zoo <- function(x, k, fill = if (na.pad) NA, na.pad = FALSE, 
    align = c("center", "left", "right"), ...) {

  if (!missing(na.pad)) warning("na.pad is deprecated. Use fill.")

  align <- match.arg(align)

  if (length(dim(x)) == 2) {
      # merge is the only zoo specific part of this method

      out <- do.call("merge", c(lapply(1:NCOL(x), function(i) {
        rollsum(x[, i, drop = TRUE], k, fill = fill, align = align, ...)
      }), all = FALSE))
      if (ncol(x) == 1) dim(out) <- c(length(out), 1)
      colnames(out) <- colnames(x)
      return(out)
  }

  n <- length(x)
  stopifnot(k <= n)

  ix <- switch(align,
      "left" = { 1:(n-k+1) },
      "center" = { floor((1+k)/2):ceiling(n-k/2) },
      "right" = { k:n })

  xu <- unclass(x)
  y <- xu[k:n] - xu[c(1, seq_len(n-k))] # difference from previous
  y[1] <- sum(xu[1:k])       # find the first
  # sum precomputed differences
  rval <- cumsum(y)

  x[ix] <- rval
  na.fill(x, fill = fill, ix)

}

If you step through the function you'll see it's actually not because of cumsum that the result evaluates to NA where you'd expect 15 (or at least that's not the first cause of it -- if you were to fix the current problem maybe cumsum would also cause a problem, I don't know). It's the line

y <- xu[k:n] - xu[c(1, seq_len(n-k))].

rollsum is a new function in the zoo package and doesn't yet handle NA's well, so I'd suggest staying with rollapply.

Hack-R
  • 22,422
  • 14
  • 75
  • 131