7

I have one week of data with a reading every 5 seconds. An example of data is below.

9/1/2012 00:00:00    1
9/1/2012 00:00:05    2
9/1/2012 00:00:10    3

I want to calculate the hourly average for each day. Then make a multi-line plot of "average hourly reading vs. hour" with lines representing different dates.

The one I have here is for weekly average

data$date = as.POSIXct(strptime(data$date, 
                  format = "%d/%m/%Y %H:%M","GMT")) 
means <- aggregate(data["nox"], format(data["date"],"%Y-%U"),
                 mean, na.rm = TRUE) 

For daily average, it is

data$date = as.POSIXct(strptime(data$date, 
                 format = "%d/%m/%Y %H:%M","GMT"))
means <- aggregate(data["nox"], format(data["date"],"%Y-%j"),
                 mean, na.rm = TRUE) 

Any one knows how to calculate the hourly average for each day.

Jaap
  • 81,064
  • 34
  • 182
  • 193
  • Somebody in another question wanted to know why I couldn't guess their object name and instead used 'dat'. In this instance I changed it to 'dat' because I refuse to use either 'data' or 'df' as dataframe names because they are both function names. – IRTFM Sep 11 '12 at 05:36
  • 1
    Dup? http://stats.stackexchange.com/questions/7268/how-to-aggregate-by-minute-data-for-a-week-into-hourly-means – naught101 Sep 11 '12 at 05:57
  • I've decided you should _not_ pick my format answer and instead choose @mrdwad's answer (whatever it's deficiencies might be in formatting). The cut.POSIXt solution is _much_ more flexible because it allows variable hour or minute intervals, eg, "15 mins". – IRTFM Sep 11 '12 at 07:02

3 Answers3

8

I like @DWin's answer, but I had also remembered seeing once a help file for ?cut.Date which can also be used in this case. I've made up some data so you can see the results over a few hours:

set.seed(1)
data <- data.frame(date = seq(from = ISOdatetime(2012, 01, 01, 00, 00, 00),
                              length.out = 4320, by=5),
                   nox = sample(1:20, 4320, replace=TRUE))

hr.means <- aggregate(data["nox"], 
                      list(hour = cut(data$date, breaks="hour")), 
                      mean, na.rm = TRUE)
hr.means
#                  hour      nox
# 1 2012-01-01 00:00:00 10.60694
# 2 2012-01-01 01:00:00 10.13194
# 3 2012-01-01 02:00:00 10.33333
# 4 2012-01-01 03:00:00 10.38194
# 5 2012-01-01 04:00:00 10.51111
# 6 2012-01-01 05:00:00 10.26944
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • I was expecting a longer set of labels than the format answer would produce but instead got shorter ones (and puzzling different than what you show). Not ideal IMO. – IRTFM Sep 11 '12 at 05:43
  • I get with my example: 2012-01-09 ... (no hour or time designation at all.) With your example I get: 2012-01-01 00:00:00 (superfluous minutes and seconds) – IRTFM Sep 11 '12 at 06:36
  • @DWin, when I run your code with my sample data, I get the same results, but the first column is named "`date`", and the first row looks like `1 2012-01-01 00 10.60694`. – A5C1D2H2I1M1N2O1R2T1 Sep 11 '12 at 06:40
  • @DWin, sorry, one more comment: yes, there are superfluous minutes and seconds in my output--that's from using `cut.Date` and is why I had said I like your answer more! ;-) – A5C1D2H2I1M1N2O1R2T1 Sep 11 '12 at 06:44
  • As noted above: I like yours better, now that I understand the true Power of the Cut. – IRTFM Sep 11 '12 at 07:04
5

It would only require changing your format specification in the by-vector:

hr.means <- aggregate(dat["V1"], format(dat["date"],"%Y-%m-%d %H"),
             mean, na.rm = TRUE) 
hr.means
#---------
           date V2
1 2012-01-09 00  2
IRTFM
  • 258,963
  • 21
  • 364
  • 487
0

I got here from here so I have the data in a slightly different form, but using lubridate you can easy parse your data format as well.

library(tibble)
library(dplyr)
library(lubridate)

tbl <- tribble(
    ~TIME,                 ~MEASURE,
    "2018-01-01 06:58:50",    05,
    "2018-01-01 07:00:00",    10,
    "2018-01-01 07:04:45",    20,
    "2018-01-01 07:04:55",    25,
    "2018-01-01 07:21:00",    20,
    "2018-01-01 07:58:04",    18,
    "2018-01-01 07:59:59",    12,
    "2018-01-01 08:00:00",    17,
    "2018-01-01 08:01:04",    30
) %>% mutate(TIME = ymd_hms(TIME))

With the data in a form where you can manipulate the date/time, you can summarise it per date+hour or just per hour over all dates as this:

# if you want per date
tbl %>% 
    mutate(date = date(TIME), hour = hour(TIME)) %>% 
    group_by(date, hour) %>% summarise(m = mean(MEASURE))

# if you want per hour over all dates
tbl %>% 
    mutate(hour = hour(TIME)) %>% 
    group_by(hour) %>% summarise(m = mean(MEASURE))

To plot it using points and lines with ggplot2, you can do

library(ggplot2)
tbl %>% 
    mutate(hour = hour(TIME)) %>% 
    group_by(hour) %>% summarise(m = mean(MEASURE)) %>%
    ggplot(aes(x = hour, y = m)) + geom_point() + geom_line()
Thomas Mailund
  • 1,674
  • 10
  • 16