I have the following sample df, consisting of a category (Cat) in which different articles are shown (Pizza or Pasta) together with their sales data for different calender weeks (CW). In some weeks there is a promotion, which causes the sales to go up. The 6 largest sales values are marked as such promotions.
# example df
set.seed(99999)
df <- data.frame(Cat = rep(c("A","B"),52),
Article = rep(c("Pizza","Pasta"),52))
df <- df[order(df$Cat),]
df$CW <- rep(1:52,2)
df$sales <- abs(2+rnorm(104))
df$promotion <- ifelse(rank(df$sales,ties.method=c("last"))>98,1,0)
The challenge now is to calculate a "baseline" against which to judge the promotion. The baseline needs to meet the following requirements:
- per article and category, three weeks prior to the promotion need to be calculated as the baseline
- the baseline should be "running", i.e. there should be a baseline for every calender week
- within the three previous weeks for baseline construction, there must be no promotion. If there is, that particular week is ignored and the more previous one is taken until a week with no promotion is found and three weeks are achieved. If this should not be possible, e.g. CW 1-3, then the next future baseline value is taken
I have tried solving this challenge using existing posts on stackoverflow , but with no success. Hence I am asking for help.