Using rollmean when there are missing values (NA)

Question

I have a data set which has a couple of NA in it. I take a rolling mean and expect that when there is no NA in the window, the rolling mean should produce a number as opposed to NA, however, rollmeanr in zoo does not seem to do this. Example:

require(zoo)
z = zoo(cbind(a=0:10, b=c(NA,10:1), c=sample(1:11,11)), 1:11) 
rollmeanr(z, k=3, fill=NA)
    a  b        c
1  NA NA       NA
2  NA NA       NA
3   1 NA 3.333333
4   2 NA 4.666667
5   3 NA 4.000000
6   4 NA 6.333333
7   5 NA 7.000000
8   6 NA 9.333333
9   7 NA 8.333333
10  8 NA 8.666667
11  9 NA 5.666667

rollapply(z, width=3, FUN=mean, by=1, by.column=TRUE, fill=NA, align="right")
    a  b        c
1  NA NA       NA
2  NA NA       NA
3   1 NA 3.333333
4   2  9 4.666667
5   3  8 4.000000
6   4  7 6.333333
7   5  6 7.000000
8   6  5 9.333333
9   7  4 8.333333
10  8  3 8.666667
11  9  2 5.666667

I would expect these two calls to generate the same result. Please comment. Some session info:

sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] zoo_1.7-10

loaded via a namespace (and not attached):
 [1] grid_3.0.1      lattice_0.20-15

From the help file I have : `The default method of ‘rollmean’ does not handle inputs that contain ‘NA’s. In such cases, use ‘rollapply’ instead.` — dickoa, Jul 20 '13 at 18:05
Yes, I saw that. I assumed that It would just not allow you to skip over NA as rollapply allows you to pass na.rm=TRUE. Should that be read as it breaks when there are NA? — Alex, Jul 20 '13 at 18:08

score 16 · Accepted Answer · answered Jul 20 '13 at 18:07

16

From ?rollmean

The default method of ‘rollmean’ does not handle inputs that contain ‘NA’s. In such cases, use ‘rollapply’ instead.

answered Jul 20 '13 at 18:07

GSee

48,880
13
125
145

Yes, I saw that. I assumed that It would just not allow you to skip over NA as rollapply allows you to pass na.rm=TRUE. Should that be read as it breaks when there are NA? – Alex Jul 20 '13 at 18:09
Look at `zoo:::rollmean.zoo` and note that `na.rm` is not passed anywhere. – GSee Jul 20 '13 at 18:11
yeh, that's not what i was saying though. i thought `na.rm=FALSE` would be the default and you can't modify that in `rollmean` where as you can modify that in `rollapply`. That's what I understood the help file to be saying. Obviously I was incorrect. – Alex Jul 20 '13 at 18:15
2

You could always use 'filter' function. It has no problems with NAs and very fast – George Steblovsky Jul 20 '13 at 19:12
1

@GeorgeSteblovsky Yes, `as.zoo(apply(z, 2, function(x) filter(x, rep(1/3, 3), sides=1)))` is about 9 times faster in this case. – GSee Jul 20 '13 at 19:48
@GeorgeSteblovsky while `NA`s are *allowed* using `filter()`, they pollute the output much more than they do in `rollapply` - try `x <- c(5, 7, 10, NA, 3, 6, 2, NA, 1, 9); as.numeric(filter(x, rep(1/3, 3))); zoo::rollapply(x, 3, mean, na.rm=TRUE)` and compare the output. – Ken Williams Oct 10 '17 at 22:01

JKim · Answer 2 · 2017-03-06T08:23:47.663

6

Use 'partial=TRUE' option. The option makes it possible to calculate data with NA.

> rollapply(z, width=3, FUN=function(x) mean(x, na.rm=TRUE), by=1, by.column=TRUE, partial=TRUE, fill=NA, align="right")

     a    b        c
1  0.0  NaN 1.000000
2  0.5 10.0 5.500000
3  1.0  9.5 4.333333
4  2.0  9.0 6.666667
5  3.0  8.0 4.666667
6  4.0  7.0 6.000000
7  5.0  6.0 7.000000
8  6.0  5.0 8.666667
9  7.0  4.0 8.333333
10 8.0  3.0 7.000000
11 9.0  2.0 5.000000

If you want to change 'NaN' in the first row to '0', modify 'fill=NA' to 'fill=0'.

edited Mar 06 '17 at 08:23

answered Mar 06 '17 at 04:19

JKim

135
2
7

2

or equivalently: `rollapplyr(z, 3, mean, na.rm = TRUE, by = 1, partial = TRUE, fill = NA)` – G. Grothendieck Mar 06 '17 at 16:33
Is it possible to calculate the mean for cells only when the original value was NA? In other words can original values be kept while imputing averages within the given window only where the original values were NA? Similar to na.fill(x, 'extend') but with a limit to which it 'extends' being the window or width. – Nebulloyd May 19 '22 at 03:13
@Nebulloyd I think your question is about 'mean imputation'. https://statisticsglobe.com/mean-imputation-for-missing-data/ – JKim May 20 '22 at 04:23
@JKim Unfortunately no I am not. The key difference is that the mean should only be calculated over a small 'window' in the column. The examples in your link fill all NAs of a column with the same mean value (column mean). I am using a time series data set so I expect the values directly before or after to be more similar to missing NAs than the column mean. I also want long streaks of NA larger than a certain value to remain NA. – Nebulloyd May 20 '22 at 05:08

score 0 · Answer 3 · answered Jul 24 '17 at 21:14

0

To make it complete, rollsum can not handle inputs that contain ‘NA’s as well. In such cases, use ‘rollapply’ instead.

answered Jul 24 '17 at 21:14

Ning

514
5
5

Using rollmean when there are missing values (NA)

3 Answers3

Linked