I am using R. I know calculating moving average is a topic with several answers in this site, but I have some problems that make my question unique.
I have a data frame including 8784 hourly concentrations (366 days * 24 hours) of an air pollutant (Ozone). This data frame includes some NaN values (missing values). The procedure contains following steps:
1- calculating 8-hour moving (rolling) averages of hourly concentrations: i.e. every 8 concentrations should be averaged in this way: average of 1 to 8, average of 2 to 9, average of 3 to 10, etc. This leads to obtaining 24 moving averages for each day (every 24 hours).
2- for each day, I want the maximum of 8-hour moving averages: i.e. among the 24 moving averages, the highest number should be selected. Finally, 366 moving average (366 days) will be selected.
3- A new data frame containing 366 moving averages should be created.
I know there are some packages (openair, zoo, TTR) that do something like this, but are there any ways to write the codes without any packages?
An Exmaple of my data
ColName
1 18.76
2 12.92
3 8.12
4 NaN
5 12.92
6 3.77
7 18.76
8 9.52
9 94.09
10 18.76
11 14.13
12 8.12
13 2.04
14 12.92
15 9.17
.
.
.
8783 34.58
8784 64.23
The name of main data frame is "Hourly". I tried these codes:
Hourly1 <- c(0, cumsum(ifelse(is.nan(Hourly), 0, Hourly)))
rsum <- (Hourly1[(Hourly1+1):length(Hourly1)] - Hourly1[1:(length(Hourly1) - 8)]) / 8
But when I try the first line, the following error occurs:
Error in is.nan(Hourly) : default method not implemented for type 'list'
UPDATE: I used the following codes, but the maximum of 8-hour averages is not calculated right:
Hourly2<-as.numeric(Hourly$Average)
names(Hourly2) <- rep(seq.Date(as.Date("2017-01-01"), by=1, length.out=366), each=24)
x<-Hourly2
#use cumsum to get the moving average, keep NaNs
cx <- c(0, cumsum(ifelse(is.nan(x), 0, x))) + c(0,x)*0
n <- 8
rsum <- (cx[(n+1):length(cx)] - cx[1:(length(cx) - n)]) / n
res <- data.frame(mov_avg=rsum, days=names(rsum))
#select max from each day, ignoring NaN's
mx <- aggregate(mov_avg~days, data=res, max)
I compared the final results(366 maximum of 8-hour averages, each for 1 day of year) with a standard pre-approved dataset. In some days, the codes calculated averages correctly, but in some other days not! I did not get its logic.
You can find my raw dataset here!
UPDATE 2:
Here is a link to download the final results produced by different methods!
UPDATE3:
The difference between the results was due to the different methods for calculating moving averages. There are three methods for calculating moving averages: left, right, and center. The codes proposed by the guys here follow the "right" method.