I'd like to modify the effects of the maxgaps argument of the .fill_short_gaps function in the R zoo library (used in na.locf and na.approx), as described in uday's comment here.
The following example illustrates existing behavior in the context of na.locf.
x <- c(rep(NA, 2), 1:4, rep(NA, 4), 7:8, rep(NA, 2), 9:10)
y <- na.locf(x, na.rm=FALSE, maxgap=2)
rbind(x, y)
Which results in
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16]
x NA NA 1 2 3 4 NA NA NA NA 7 8 NA NA 9 10
y NA NA 1 2 3 4 NA NA NA NA 7 8 8 8 9 10
However, I'd like the group of four internal NAs in 7:10 to be filled forward with maxgap
values, with the rest as NAs. E.g.:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16]
x NA NA 1 2 3 4 NA NA NA NA 7 8 NA NA 9 10
z NA NA 1 2 3 4 4 4 NA NA 7 8 8 8 9 10
For reference, here is the .fill_short_gaps function from the zoo package:
## x = series with gaps
## fill = same series with filled gaps
.fill_short_gaps <- function(x, fill, maxgap) {
if (maxgap <= 0)
return(x)
if (maxgap >= length(x))
return(fill)
naruns <- rle(is.na(x))
# This is the part I want to modify. Currently sets all runs > maxgap
# to FALSE (meaning don't fill)
naruns$values[naruns$lengths > maxgap] <- FALSE
naok <- inverse.rle(naruns)
ifelse(naok, fill, x)
}
Using x
as the example, naruns
looks like this:
naruns
Run Length Encoding
lengths: int [1:6] 2 4 4 2 2 2
values : logi [1:6] TRUE FALSE TRUE FALSE TRUE FALSE
One approach to solving my problem would involve inserting values in the appropriate places in the naruns
vectors so that naok
can be created correctly. This would look like:
Run Length Encoding
lengths: int [1:7] 2 4 2 2 2 2 2
values : logi [1:7] TRUE FALSE TRUE FALSE FALSE TRUE FALSE
That is, the 4 (TRUE) in position 3 would be split into 2 (TRUE) and 2 (FALSE), that is, at positions identified by which(naruns$values & naruns$lengths > maxgap)
but I'm not sure of a good way to insert values in the correct positions.
I've considered several clumsy approaches to doing this, but they've ended in dead ends. I know from looking at the answers to other (unrelated) questions on SO that many can come up with something much more robust and scalable than anything I am likely to emit in a reasonable timespan. Thanks for any help.