3

I'm trying to calculate the average values based on the date in a column. With the option to select the number of previous days for example, 4 days. Get the average of the previous 4 records minus the StartDate and roll the average down until there is an EndDate.

I was trying the

tapply(df$Boe, df$ShutinDate, function(x) mean(tail(sort(x), 5)))

function but I’m not getting the correct average.

Output

Name    DATE    Values  StartDate   EndDate    Average
TestA   3/3/2017    50          
TestA   3/4/2017    75          
TestA   3/5/2017    25          
TestA   3/6/2017    100         
TestA   3/7/2017    100         
TestA   3/8/2017    50          
TestA   3/9/2017    80          
TestA   3/10/2017   90          
TestA   3/11/2017   25             3/11/2017        
TestA   3/12/2017   0                           80
TestA   3/13/2017   0                           80
TestA   3/14/2017   0                           80
TestA   3/15/2017   0                           80
TestA   3/16/2017   50      3/16/2017   
RLW
  • 66
  • 5
  • Please use `dput(df)` so we can reproduce your data and attempt solutions. Providing an sample of the desired output would also be good. – Calum You Apr 20 '18 at 21:16

2 Answers2

1

1) We group by Name (assuming rollapply should be done separately for each Name) and then use width = list(-seq(4)) with rollapply which uses offsets -1, -2, -3, -4 for each application of mean. (Offset 0 would be the current point but we want the 4 prior here.)

Not clear what you are referring to regarding start time so that part has been left out. Also I have assumed that the data is sorted (which is the case in the question). You might also want to convert the dates to "Date" class but that isn't needed to answer the question if the rows are already sorted.

library(zoo)

roll <- function(x) rollapply(x, list(-seq(4)), mean, fill = NA)
transform(DF, Average = ave(Values, Name, FUN = roll))

2) or if you like dplyr then using roll from above:

library(dplyr)
library(zoo)

DF %>% 
   group_by(Name) %>% 
   mutate(Average = roll(Values)) %>% 
   ungroup()
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Thanks for suggesting that I look at this answer. I've studied part 2) but haven't been able to apply it to my situation. https://stackoverflow.com/questions/50023898/compute-a-rolling-weighted-sum-by-group – Bobby Apr 25 '18 at 15:05
0

An option is to use zoo::rollapply along with dplyr::lag as:

library(dplyr)
library(lubridate)
library(zoo)

df %>% mutate(DATE = mdy(DATE)) %>%   #Convert to Date
  arrange(Name, DATE) %>%             #Order on Name and DATE
  mutate(Avg = rollapply(Values, 4, mean, fill= NA, align = "right")) %>%
  mutate(Average = lag(Avg)) %>%      # This shows mean for previous 4 rows
  select(-Avg)
#     Name       DATE Values Average
# 1  TestA 2017-03-03     50      NA
# 2  TestA 2017-03-04     75      NA
# 3  TestA 2017-03-05     25      NA
# 4  TestA 2017-03-06    100      NA
# 5  TestA 2017-03-07    100   62.50
# 6  TestA 2017-03-08     50   75.00
# 7  TestA 2017-03-09     80   68.75
# 8  TestA 2017-03-10     90   82.50
# 9  TestA 2017-03-11     25   80.00
# 10 TestA 2017-03-12      0   61.25
# 11 TestA 2017-03-13      0   48.75
# 12 TestA 2017-03-14      0   28.75
# 13 TestA 2017-03-15      0    6.25
# 14 TestA 2017-03-16     50    0.00

Data:

df <- read.table(text = 
"Name    DATE    Values  
TestA   '3/3/2017'    50          
TestA   '3/4/2017'    75          
TestA   '3/5/2017'    25          
TestA   '3/6/2017'    100         
TestA   '3/7/2017'    100         
TestA   '3/8/2017'    50          
TestA   '3/9/2017'    80          
TestA   '3/10/2017'   90          
TestA   '3/11/2017'   25  
TestA   '3/12/2017'   0   
TestA   '3/13/2017'   0   
TestA   '3/14/2017'   0   
TestA   '3/15/2017'   0   
TestA   '3/16/2017'   50",
header = TRUE, stringsAsFactors = FALSE) 
MKR
  • 19,739
  • 4
  • 23
  • 33