0

Still quite new to R, so trying to figure out what I am doing wrong in the following explanation.

I am trying to calculate the expanding mean over time per subgroup for a dataframe. My code works when there is only a single subgroup in the dataframe, but starts to break when multiple subgroups are available within the dataframe.

Apologies if I have overlooked something, but I cant figure out where exactly my code is incorrect. My hunch is that I am not filling in the width correctly, but I have not been able to figure out how to change width to a dynamically expanding window over time per subgroup.

See my data below; sample file

See my code below;

library(ggplot2)
library(zoo)
library(RcppRoll)
library(dplyr)

x <- read.csv("stackoverflow.csv")

x$datatime <- as.POSIXlt(x$datatime,format="%m/%d/%Y %H:%M",tz=Sys.timezone())
x$Event <- as.factor(x$Event)

x2 <- arrange(x,x$Event,x$datatime) %>% 
  group_by(x$Event) %>% 
  mutate(ma=rollapply(data = x$Actual, width=seq_along(x$Actual), FUN=mean,
                          partial=TRUE, fill=NA,
                          align = "right"))

Any help is very much appreciated!

Thanks

EDIT:

A fix has been found! Thanks to all the useful feedback.

The working code is;

x <- 
  arrange(x,x$Event,x$datatime) %>% 
  group_by(Event) %>% 
  mutate(ma=rollapply(data = Actual, 
                      width=seq_along(Actual), 
                      FUN=mean,
                      partial=TRUE, 
                      fill=NA,
                      align = "right"))
cehss
  • 1
  • 1
  • Without a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) it's difficult to say but adding a `group_by(Event)` before the `mutate`-statement might do the trick. – tivd Feb 12 '22 at 10:24
  • Hi @tivd Thanks for the link! very useful, I adjusted my initial comment as well. When I try using the group_by(Event), it tells me that my window length is too long; Error: Problem with `mutate()` column `ma`. i `ma = rollapply(...)`. i `ma` must be size 9 or 1, not 19. i The error occurred in group 1: x$Event = Labour Costs YoY. How would I create a dynamic window that adjusts appropriately per group? I added a screenshot in my initial comment as well, as I was unable to copy over a sample subset into my comment. Thanks! – cehss Feb 12 '22 at 10:55
  • Try removing all the `x$`'s from your dplyr function calls, and just refer to the columns by their names. Using e.g. `x$Actual` in `mutate()` causes you to look up the full length vector in the original data, rather than using the slice in the group. – Mikko Marttila Feb 12 '22 at 11:01

1 Answers1

1

I think the problem here is that you’re using x$ to extract columns from the original data in mutate(), rather than using the column name directly to refer to the column in the grouped slice. In dplyr verbs you can (and in case of grouped operations, must) refer to the columns directly. The solution is to just remove all x$ references from your code in dplyr functions.

Here’s a small example that illustrates what’s going on:

library(dplyr, warn.conflicts = FALSE)

tbl <- tibble(g = c(1, 1, 2, 2, 2), x = 1:5)
tbl
#> # A tibble: 5 x 2
#>       g     x
#>   <dbl> <int>
#> 1     1     1
#> 2     1     2
#> 3     2     3
#> 4     2     4
#> 5     2     5

tbl %>% 
  group_by(g) %>% 
  mutate(y = cumsum(tbl$x))
#> Error in `mutate_cols()`:
#> ! Problem with `mutate()` column `y`.
#> i `y = cumsum(tbl$x)`.
#> i `y` must be size 2 or 1, not 5.
#> i The error occurred in group 1: g = 1.

And how to fix it:

tbl %>% 
  group_by(g) %>% 
  mutate(y = cumsum(x))
#> # A tibble: 5 x 3
#> # Groups:   g [2]
#>       g     x     y
#>   <dbl> <int> <int>
#> 1     1     1     1
#> 2     1     2     3
#> 3     2     3     3
#> 4     2     4     7
#> 5     2     5    12
Mikko Marttila
  • 10,972
  • 18
  • 31
  • Hi @Mikko, Thanks for the help! Your advice was really helpful, and the eventual working code is; x <- arrange(x,x$Event,x$datatime) %>% group_by(Event) %>% mutate(ma=rollapply(data = Actual, width=seq_along(Actual), FUN=mean, partial=TRUE, fill=NA, align = "right")) – cehss Feb 12 '22 at 13:02
  • rollapplyr with an r on the end can be used to default to right alignment. – G. Grothendieck Feb 12 '22 at 15:21