0

Hi I have a df and How to calculate rolling mean for previous N rows ,In my case it is 3

df <- data.frame(X=c(24,NA,NA,45,NA,20,24,10,40,20,20), Y=c(10,NA,14,14,14,10,NA,10,14,14,14))

I tried to get o/p as below

library(data.table)
library(zoo)
df[, Rolling.Average := rollmeanr(X, 3, fill = NA), by = Y]

o/p:

X   Y  Rolling.Average
24  10  NA
NA  NA  NA
NA  14  NA
45  14  NA
NA  14  45
20  10  NA
24  NA  NA
10  10  20
40  14  NA
20  14  40
20  14  30

Error: k <= n is not TRUE Any modifications in above code,please suggest Thanks

Srm Murty
  • 135
  • 8
  • Related: [https://stackoverflow.com/questions/743812/calculating-moving-average](https://stackoverflow.com/questions/743812/calculating-moving-average) – YCR Jul 13 '17 at 11:02
  • Thanks AEBilgrau, the expected output is if the value in Y apears in last 3 rows then the average of X for that value in Roll.Ave i.e- for 14 in row id 3 we dont have same value in previous 3 rows, and for row-id 4 it appears in previous 3 rows and avg is NA,same for last row for 14 avg of for same value in previous 3 rows is 30 – Srm Murty Jul 13 '17 at 11:04
  • (My previous comment was gibberish, so here it is restated.) What exactly is your expected output? How should `Y` equal to `NA` be interpreted with the `by` argument? And what do you expect the average should be with a right-aligned window when the first entries are `NA` for `Y` – Anders Ellern Bilgrau Jul 13 '17 at 11:11
  • 1
    Depending on what you want use `rollapplyr(X, 3, mean, partial = TRUE)` or `rollapplyr(X, 3, mean, partial = TRUE, na.rm = TRUE)` or `rollapplyr(c(NA, NA, X), 3, mean)` or `rollapplyr(c(NA, NA, X), 3, mean, na.rm = TRUE)` – G. Grothendieck Jul 13 '17 at 11:52
  • Thanks Grothendieck, rollapplyr(X, 3, mean, partial = TRUE, na.rm = TRUE) is giving answer – Srm Murty Jul 13 '17 at 12:30
  • Also, instead of calculates last 3 value of same value, can we modify it to previous N rows? – Srm Murty Jul 13 '17 at 12:52

1 Answers1

0

The error is thrown as you have less that 3 rows for the "NA" group -- which data.table seems to interpret as a group.

So, your data is expected to have at least three entries for each by-group. Then the code works (I have added a row below to your example below):

df <- data.frame(X=c(24,NA,NA,45,NA,20,24,10,40,20,20,20),
                 Y=c(10,NA,14,14,14,10,NA,10,14,14,14,NA))

library(data.table)
library(zoo)
df <- as.data.table(df)
df[, Rolling.Average := rollmeanr(X, k = 3, fill = NA), by = Y]
print(df)
#     X  Y Rolling.Average
# 1: 24 10              NA
# 2: NA NA              NA
# 3: NA 14              NA
# 4: 45 14              NA
# 5: NA 14              NA
# 6: 20 10              NA
# 7: 24 NA              NA
# 8: 10 10              18
# 9: 40 14              NA
#10: 20 14              NA
#11: 20 14              NA
#12: 20 NA              NA

Or, you can exclude the NAs of the by-clause by something like:

df[!is.na(Y), Rolling.Average := rollmeanr(X, k = 3, fill = NA), by = Y]

Which yields the same output. It does not match your expected output, but I do not really understand how one should arrive at that.

Anders Ellern Bilgrau
  • 9,928
  • 1
  • 30
  • 37