How to calculate the rolling mean of the nth previous values

Question

So let's say I have a data frame of a series of values which are assigned to one of two groups ('Gp'):

set.seed(12)
df <- data.frame(id = sample(1:50,50), Gp = sample(2, 50, TRUE))

Here are the top 20 values from this:

And what I would like to do is take the rolling average of values 10, 11 and 12 places before the current row, and then subtract the current (row) value from it. I would also like to fill with NA, group by another column ('Gp') and align right.

I've seen some other posts and have been able to do this with dplyr minus the nth previous values part:

df2 <- df %>% group_by(Gp) %>%
  mutate( rm = rollmeanr(id,k = 3, fill = NA)) %>%
  ungroup

But this just goes by the current row and the 2 rows before it. In the example above, what I want to do is, for example for row 15:

row 15 - (row 5 + row 4 + row 3)/3

= 10 - (8 + 12 + 46) / 3 = -12

And I would like it to be placed in a new column appended to the original df with a new name. I would also like to use dplyr if possible.

For this example, this row of the desired output df is:

row id Gp rm
15 10  1 -12

I'm thinking lag() may be helpful, but will need the 3 lagged values together and can see this getting messy without cleaner code.

Similar questions:

Calculating Mean for previous 6 nth data point

Moving mean for the following days not including the current day

Calculate the moving average of nth data points over a time series with NAs r

Sorry so in the original example id would be error_count; I updated it — SqueakyBeak, Jul 11 '22 at 16:05
Perfect. Unfortunately, I cannot replicate your data with the seed either. — harre, Jul 11 '22 at 16:08
Your description says `rolling average of values #10, 11 and 12, and then subtract the current (row) value from it`, but the example shows row 15 subtracted from average of row 5, 4, 3. In addition, as others mentioned, the set.seed data is not the same as you showed — akrun, Jul 11 '22 at 16:14
Yes I mean if Im on row 15, I want to take rolling average of the values 10/11/12 places before it from the value of row 15 — SqueakyBeak, Jul 11 '22 at 16:17
In this context, align right means that the counting starts from current row (ex is 15) and then working on subsequent rows, (ex row 16) incrementing everything 1 row forward (so then the 3 values in the rolling average would be 11, 12 and 13 values previous to it) — SqueakyBeak, Jul 11 '22 at 16:18
@SqueakyBeak so, is it minimum 10 rows before from the ith row — akrun, Jul 11 '22 at 16:19
But doesn't that mean that the rolling average would always based on the same three observations? I.e. `df |> mutate(id_lag = lag(id, 10), rm = id_lag - rollmeanr(id_lag, k = 3, fill = NA))` would not help you. — harre, Jul 11 '22 at 16:22
So the number that increments row number increments the 3 numbers I want to be averaged. So if its row 16, the 3 numbers to be averaged are rows 6,5,4 instead of 5,4,3. Maybe rolling mean isnt right for this one? — SqueakyBeak, Jul 11 '22 at 16:25
If `5, 4, 3` for row `15` then `6, 5, 4` for row `16`, right? — harre, Jul 11 '22 at 16:27
Try `df %>% group_by(Gp) %>% mutate(rm = map_dbl(row_number(), ~ { i1 <- (.x -12):(.x - 10) id[.x] - mean(id[i1[i1 > 0]], na.rm = TRUE) }))` — akrun, Jul 11 '22 at 16:28
I tried this: df %>% group_by(Gp) %>% mutate(rm = map_dbl(row_number(), ~ {i1 <- (.x -12):(.x - 10)id[.x] - mean(id[i1[i1 > 0]], na.rm = TRUE)})) ...and am getting this error: Error: unexpected symbol in "df %>% group_by(Gp) %>% mutate(rm = map_dbl(row_number(), ~ {i1 <- (.x -12):(.x - 10)id" — SqueakyBeak, Jul 11 '22 at 16:31
I guess you need to add the linebreaks `;` `df %>% group_by(Gp) %>% mutate(rm = map_dbl(row_number(), ~ { i1 <- (.x -12):(.x - 10) ; id[.x] - mean(id[i1[i1 > 0]], na.rm = TRUE) }))` — akrun, Jul 11 '22 at 16:33
I think this works but I'm preferring the solution by harre as its easier to read/verify. Thank you for your help though! — SqueakyBeak, Jul 11 '22 at 16:40

harre · Accepted Answer · 2022-07-11T16:41:43.180

1

You could lag the id variable before taking using the rolling mean:

library(dplyr)

df |> 
  group_by(Gp) |>
  mutate(rm = id - zoo::rollmeanr(lag(id, 10), k = 3, fill = NA)) |>
  ungroup()

Update: Typo + added group_by.

edited Jul 11 '22 at 16:41

answered Jul 11 '22 at 16:29

harre

7,081
2
16
28

I think this works, let me take a look at the other solutions and will finalize after that. Thank you – SqueakyBeak Jul 11 '22 at 16:33
1

This works as long as I add the %>% group_by(Gp) %>% after df %>%. Thank you! – SqueakyBeak Jul 11 '22 at 16:40

How to calculate the rolling mean of the nth previous values

1 Answers1