-1

I have a data frame like below (sample data). I want to add two columns for each day to show average and std sales of same day in the last 3 weeks. What I mean by this is the same 3 previous days (last 3 Tuesdays, last 3 Wednesdays, etc.)

df <- data.frame(
  stringsAsFactors = FALSE,
              date = c("3/28/2019","3/27/2019",
                       "3/26/2019","3/25/2019","3/24/2019","3/23/2019",
                       "3/22/2019","3/21/2019","3/20/2019","3/19/2019","3/18/2019",
                       "3/17/2019","3/16/2019","3/15/2019","3/14/2019",
                       "3/13/2019","3/12/2020","3/11/2020","3/10/2020","3/9/2021",
                       "3/8/2021","3/7/2021","3/6/2022","3/5/2022",
                       "3/4/2022","3/3/2023"),
           weekday = c(4L,3L,2L,1L,7L,6L,5L,4L,
                       3L,2L,1L,7L,6L,5L,4L,3L,2L,1L,7L,6L,5L,4L,
                       3L,2L,1L,7L),
          store_id = c(344L,344L,344L,344L,344L,
                       344L,344L,344L,344L,344L,344L,344L,344L,344L,344L,
                       344L,344L,344L,344L,344L,344L,344L,344L,344L,
                       344L,344L),
       store_sales = c(1312005L,1369065L,1354185L,
                       1339183L,973780L,1112763L,1378349L,1331890L,1357713L,
                       1366399L,1303573L,936919L,1099826L,1406752L,
                       1318841L,1321099L,1387767L,1281097L,873449L,1003667L,
                       1387767L,1281097L,873449L,1003667L,1331636L,1303804L)
)

For example for 3/28/2019 take average sales of (3/21/2019 & 3/14/2019 & 3/7/2021) , like this

date    weekday store_id    store_sales avg_sameday3
3/28/2019   4   344         1312005      1310609 
DanG
  • 689
  • 1
  • 16
  • 39

1 Answers1

1

We can group by weekday and store_id and calculate rolling mean for last 3 entries using zoo::rollapplyr.

library(dplyr)

df %>%
  arrange(weekday) %>%
  group_by(store_id, weekday) %>%
  mutate(store_sales_avg = zoo::rollapplyr(store_sales, 4, 
                                 function(x) mean(x[-1]), partial = TRUE))

Note that I have used window size as 4 and removed the first entry from mean calculation so that it does not consider the current value while taking mean. With partial = TRUE it takes mean even when last values are less than 4.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213