How to get hourly average for a timeseries in R for a specified date range?

Question

I have hourly data for multiple areas A,B,C over 2 years. I am looking to be able to get an hourly average over a specified date range for each area. Sorry, I read through: How to make a great R reproducible example but wasn't sure how to use dput() to properly represent the data I have. Please see below for my sample data from copying the output from dput(mydata):

structure(list(time = structure(c(1451606400, 1451610000, 1451613600, 
1451617200, 1451620800, 1451624400, 1451628000, 1451631600, 1451635200, 
1451638800), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    A = c(0.0173731172095063, 0.0175417882503753, 0.0175839560105925, 
    0.017499620490158, 0.0173309494492891, 0.017668291531027, 
    0.017836962571896, 0.017836962571896, 0.0182586401740685, 
    0.0234452746807901), B = c(0.0173567013800694, 0.0173567013800694, 
    0.0170744785934016, 0.0172155899867355, 0.0170744785934016, 
    0.0172155899867355, 0.0172155899867355, 0.0172861456834025, 
    0.0173567013800694, 0.0198261507634126), C = c(0.00791114205246669, 
    0.00806936489351603, 0.00806936489351603, 0.00806936489351603, 
    0.00806936489351603, 0.00822758773456536, 0.00854403341666403, 
    0.00854403341666403, 0.00854403341666403, 0.012341381601848
    )), class = "data.frame", row.names = c(NA, 10L))

Basically I took the time data I had and created separate columns to represent the year, month, day and hour.

structure(list(Year = c("2016", "2016", "2016", "2016", "2016", 
"2016", "2016", "2016", "2016", "2016"), Month = c("01", "01", 
"01", "01", "01", "01", "01", "01", "01", "01"), Day = c("01", 
"01", "01", "01", "01", "01", "01", "01", "01", "01"), hour = c("00", 
"01", "02", "03", "04", "05", "06", "07", "08", "09"), timedata = structure(c(1451606400, 
1451610000, 1451613600, 1451617200, 1451620800, 1451624400, 1451628000, 
1451631600, 1451635200, 1451638800), class = c("POSIXct", "POSIXt"
), tzone = "UTC")), class = "data.frame", row.names = c(NA, 10L
))

I am looking to get the average values populated for 24 hours as follows based on a specified date range. The reason why I broke up the time into specific year, month, day, and hour columns was to do something like group_by(), but I have a couple issues.

I am looking to get the average for a specified date range (e.g. Jan to Mar without weekends).

The final output matrix I am expecting should be a matrix 25 x 4. Value x at hour 0:00 below would be the average of the values for weekday in Jan to Mar at hour 0:00 for area A.

time A B C 
0:00 x
1:00
2:00
3:00

Thank you.

score 0 · Accepted Answer · answered Jul 25 '21 at 03:29

You can try this -

library(dplyr)
library(lubridate)

df %>% 
  mutate(month = month(time), 
         hour = hour(time)) %>%
  filter(format(time, '%u') %in% 1:5, month %in% 1:3) %>%
  group_by(hour) %>%
  summarise(across(A:C, mean, na.rm  =TRUE))

format(time, '%u') %in% 1:5 would keep only the weekdays (Mon-Fri), month %in% 1:3 would keep only the months Jan-Mar.

How to get hourly average for a timeseries in R for a specified date range?

1 Answers1