You can define a function that calculates a moving average with any window size you choose, and by default fills either end with NA
if you only want the rolling average to include a full window's worth of values. If you want it to average the values within a partial window, you can specify that too.
It will handle any NAs
in the input, and if all the inputs in a particular window are NA
it will return an NA
in the appropriate point of the output vector.
This function could be a lot shorter, but I have written it this way for clarity, and have included basic error checking and explanatory comments.
moving_average <- function(vec, window, full.window.only = TRUE)
{
# Define the size of the window on either side
half_window <- window %/% 2
# Ensure the vector is long enough to have at least one window
stopifnot(length(vec) > window)
# Get the indices we want to average
indices <- lapply(seq_along(vec),
function(y) {
z <- y + 0:(2 * half_window) - half_window;
z[z > 0 & z <= length(vec)]})
# Get the rolling mean at each of our indices, handling NAs as we go
result <- sapply(indices,
function(x){
if(all(is.na(vec[x]))) return(NA)
else return(mean(vec[x], na.rm = TRUE))})
# Insert NAs if we don't want partial means at either end
if(full.window.only)
{
result[1:half_window] <- NA
result[(length(vec) - half_window + 1):length(vec)] <- NA
}
return(result)
}
I'll show an example here, by trying to recreate some sample data from your description:
set.seed(1) # Ensures the random numbers are reproducible
df <- data.frame(times = as.POSIXct("2019-12-25 09:00:00") + 1:20 * 900,
values = rnorm(20, 20, 4))
The data frame looks like this:
times values
1 2019-12-25 09:20:00 17.49418
2 2019-12-25 09:40:00 20.73457
3 2019-12-25 10:00:00 16.65749
4 2019-12-25 10:20:00 26.38112
5 2019-12-25 10:40:00 21.31803
6 2019-12-25 11:00:00 16.71813
7 2019-12-25 11:20:00 21.94972
8 2019-12-25 11:40:00 22.95330
9 2019-12-25 12:00:00 22.30313
10 2019-12-25 12:20:00 18.77845
11 2019-12-25 12:40:00 26.04712
12 2019-12-25 13:00:00 21.55937
13 2019-12-25 13:20:00 17.51504
14 2019-12-25 13:40:00 11.14120
15 2019-12-25 14:00:00 24.49972
16 2019-12-25 14:20:00 19.82027
17 2019-12-25 14:40:00 19.93524
18 2019-12-25 15:00:00 23.77534
19 2019-12-25 15:20:00 23.28488
20 2019-12-25 15:40:00 22.37561
For the purposes of this example, I'll set the window size to 5 (the value and the two measurements before and after it). You'll want to set yours to 30 (or maybe 60, I wasn't sure from your question). All I have to do is:
df$rolling_average <- moving_average(df$values, 5)
Now df
looks like this:
times values rolling_average
1 2019-12-25 09:15:00 17.49418 NA
2 2019-12-25 09:30:00 20.73457 NA
3 2019-12-25 09:45:00 16.65749 20.51708
4 2019-12-25 10:00:00 26.38112 20.36187
5 2019-12-25 10:15:00 21.31803 20.60490
6 2019-12-25 10:30:00 16.71813 21.86406
7 2019-12-25 10:45:00 21.94972 21.04846
8 2019-12-25 11:00:00 22.95330 20.54054
9 2019-12-25 11:15:00 22.30313 22.40634
10 2019-12-25 11:30:00 18.77845 22.32827
11 2019-12-25 11:45:00 26.04712 21.24062
12 2019-12-25 12:00:00 21.55937 19.00824
13 2019-12-25 12:15:00 17.51504 20.15249
14 2019-12-25 12:30:00 11.14120 18.90712
15 2019-12-25 12:45:00 24.49972 18.58229
16 2019-12-25 13:00:00 19.82027 19.83435
17 2019-12-25 13:15:00 19.93524 22.26309
18 2019-12-25 13:30:00 23.77534 21.83827
19 2019-12-25 13:45:00 23.28488 NA
20 2019-12-25 14:00:00 22.37561 NA
And to check the results visually, let's plot the rolling average as a line over the points:
plot(df$times, df$values, xlab = "Time", ylab = "Value", main = "Moving average")
lines(df$times, df$rolling_average, col = "red")
Which looks like this:
