0

So let's say I have a data frame of a series of values which are assigned to one of two groups ('Gp'):

set.seed(12)
df <- data.frame(id = sample(1:50,50), Gp = sample(2, 50, TRUE)) 

Here are the top 20 values from this:

df
row  id Gp
1   4  1
2  41  1
3  46  1
4  13  1
5   8  2
6   2  2
7  48  2
8  28  2
9   1  2
10 42  2
11 16  2
12 32  1
13 15  2
14 38  2
15 10  1
16 40  1
17 35  1
18 18  2
19 22  1
20 50  2

And what I would like to do is take the rolling average of values 10, 11 and 12 places before the current row, and then subtract the current (row) value from it. I would also like to fill with NA, group by another column ('Gp') and align right.

I've seen some other posts and have been able to do this with dplyr minus the nth previous values part:

df2 <- df %>% group_by(Gp) %>%
  mutate( rm = rollmeanr(id,k = 3, fill = NA)) %>%
  ungroup

But this just goes by the current row and the 2 rows before it. In the example above, what I want to do is, for example for row 15:

row 15 - (row 5 + row 4 + row 3)/3

= 10 - (8 + 12 + 46) / 3 = -12

And I would like it to be placed in a new column appended to the original df with a new name. I would also like to use dplyr if possible.

For this example, this row of the desired output df is:

row id Gp rm
15 10  1 -12

I'm thinking lag() may be helpful, but will need the 3 lagged values together and can see this getting messy without cleaner code.

Similar questions:

Calculating Mean for previous 6 nth data point

Moving mean for the following days not including the current day

Calculate the moving average of nth data points over a time series with NAs r

SqueakyBeak
  • 366
  • 4
  • 15
  • 1
    Can you post the data for `error_count` as well? – harre Jul 11 '22 at 16:04
  • Sorry so in the original example id would be error_count; I updated it – SqueakyBeak Jul 11 '22 at 16:05
  • Perfect. Unfortunately, I cannot replicate your data with the seed either. – harre Jul 11 '22 at 16:08
  • + What does **align right** mean in this context? – harre Jul 11 '22 at 16:10
  • Your description says `rolling average of values #10, 11 and 12, and then subtract the current (row) value from it`, but the example shows row 15 subtracted from average of row 5, 4, 3. In addition, as others mentioned, the set.seed data is not the same as you showed – akrun Jul 11 '22 at 16:14
  • Yes I mean if Im on row 15, I want to take rolling average of the values 10/11/12 places before it from the value of row 15 – SqueakyBeak Jul 11 '22 at 16:17
  • In this context, align right means that the counting starts from current row (ex is 15) and then working on subsequent rows, (ex row 16) incrementing everything 1 row forward (so then the 3 values in the rolling average would be 11, 12 and 13 values previous to it) – SqueakyBeak Jul 11 '22 at 16:18
  • @SqueakyBeak so, is it minimum 10 rows before from the ith row – akrun Jul 11 '22 at 16:19
  • Yes sorry, language updated – SqueakyBeak Jul 11 '22 at 16:20
  • But doesn't that mean that the rolling average would always based on the same three observations? I.e. `df |> mutate(id_lag = lag(id, 10), rm = id_lag - rollmeanr(id_lag, k = 3, fill = NA))` would not help you. – harre Jul 11 '22 at 16:22
  • So the number that increments row number increments the 3 numbers I want to be averaged. So if its row 16, the 3 numbers to be averaged are rows 6,5,4 instead of 5,4,3. Maybe rolling mean isnt right for this one? – SqueakyBeak Jul 11 '22 at 16:25
  • If `5, 4, 3` for row `15` then `6, 5, 4` for row `16`, right? – harre Jul 11 '22 at 16:27
  • Ah crap yes you're right sorry! – SqueakyBeak Jul 11 '22 at 16:27
  • Try `df %>% group_by(Gp) %>% mutate(rm = map_dbl(row_number(), ~ { i1 <- (.x -12):(.x - 10) id[.x] - mean(id[i1[i1 > 0]], na.rm = TRUE) }))` – akrun Jul 11 '22 at 16:28
  • I tried this: df %>% group_by(Gp) %>% mutate(rm = map_dbl(row_number(), ~ {i1 <- (.x -12):(.x - 10)id[.x] - mean(id[i1[i1 > 0]], na.rm = TRUE)})) ...and am getting this error: Error: unexpected symbol in "df %>% group_by(Gp) %>% mutate(rm = map_dbl(row_number(), ~ {i1 <- (.x -12):(.x - 10)id" – SqueakyBeak Jul 11 '22 at 16:31
  • I guess you need to add the linebreaks `;` `df %>% group_by(Gp) %>% mutate(rm = map_dbl(row_number(), ~ { i1 <- (.x -12):(.x - 10) ; id[.x] - mean(id[i1[i1 > 0]], na.rm = TRUE) }))` – akrun Jul 11 '22 at 16:33
  • I think this works but I'm preferring the solution by harre as its easier to read/verify. Thank you for your help though! – SqueakyBeak Jul 11 '22 at 16:40

1 Answers1

1

You could lag the id variable before taking using the rolling mean:

library(dplyr)

df |> 
  group_by(Gp) |>
  mutate(rm = id - zoo::rollmeanr(lag(id, 10), k = 3, fill = NA)) |>
  ungroup()

Update: Typo + added group_by.

harre
  • 7,081
  • 2
  • 16
  • 28