In R aggregating by hour changes date and time

Question

I regularly use the aggregate function to find means and sums of POSIXlt data per hour and per day. I am trying to use the same function on a new dataset to get hourly averages, but when I apply it, it changes the timestamp.

The data is a data.frame (called "moT") like this:

                  TS      T
1  2016-06-26 10:10:34 19.662 
2  2016-06-26 10:40:34 21.091
3  2016-06-26 11:10:34 23.388
4  2016-06-26 11:40:34 24.448
5  2016-06-26 12:10:34 25.513
6  2016-06-26 12:40:34 26.390
7  2016-06-26 01:10:34 27.468
8  2016-06-26 01:40:34 27.567
9  2016-06-26 02:10:34 26.977
10 2016-06-26 02:40:34 25.222
11 2016-06-26 03:10:34 23.100
12 2016-06-26 03:40:34 24.158
13 2016-06-26 04:10:34 21.951
14 2016-06-26 04:40:34 21.473
15 2016-06-26 05:10:34 19.948
16 2016-06-26 05:40:34 19.472
17 2016-06-26 06:10:34 18.806
18 2016-06-26 06:40:34 16.808
19 2016-06-26 07:10:34 15.282
20 2016-06-26 07:40:34 14.517

or as per suggested format:

structure(list(TS = structure(list(sec = c(34, 34, 34, 34, 34, 
  34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34), 
min = c(10L, 40L, 10L, 40L, 10L, 40L, 10L, 40L, 10L, 40L, 
10L, 40L, 10L, 40L, 10L, 40L, 10L, 40L, 10L, 40L), hour = c(10L, 
10L, 11L, 11L, 12L, 12L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 
5L, 5L, 6L, 6L, 7L, 7L), mday = c(26L, 26L, 26L, 26L, 26L, 
26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 
26L, 26L, 26L), mon = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), year = c(116L, 
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L), wday = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L), yday = c(177L, 177L, 177L, 177L, 177L, 177L, 
177L, 177L, 177L, 177L, 177L, 177L, 177L, 177L, 177L, 177L, 
177L, 177L, 177L, 177L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), 
zone = c("GMT+5", "GMT+5", "GMT+5", "GMT+5", "GMT+5", "GMT+5", 
"GMT+5", "GMT+5", "GMT+5", "GMT+5", "GMT+5", "GMT+5", "GMT+5", 
"GMT+5", "GMT+5", "GMT+5", "GMT+5", "GMT+5", "GMT+5", "GMT+5"
), gmtoff = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_)), .Names = c("sec", "min", "hour", "mday", "mon", 
"year", "wday", "yday", "isdst", "zone", "gmtoff"), class = c("POSIXlt", 
"POSIXt"), tzone = "Etc/GMT+5"), T = c(19.662, 21.091, 23.388, 
24.448, 25.513, 26.39, 27.468, 27.567, 26.977, 25.222, 23.1, 
24.158, 21.951, 21.473, 19.948, 19.472, 18.806, 16.808, 15.282, 
14.517)), .Names = c("TS", "T"), row.names = c(NA, 20L), class = "data.frame")

I apply this code to "moT":

dat <- aggregate(moT["T"], format(moT["TS"], "%Y-%m-%d %H"), mean)

I expect this output (for the first five rows):

              TS   meanT
1 "2016-06-26 10" 20.3765       
2 "2016-06-26 11" 23.918
3 "2016-06-26 12" 25.9515
4 "2016-06-26 13" 27.5175
5 "2016-06-26 14" 26.0995

that is what has happened when I have used the same function on other datasets.

but instead it is this:

               TS   meanT
1 "2016-01-07 00" 14.5650
2 "2016-01-07 01" 14.0380
3 "2016-01-07 02" 13.6540
4 "2016-01-07 03" 13.6540
5 "2016-01-07 04" 13.7500

Why is the date and time changing???

I have tried using POSIXct instead of POSIXlt, have tried reformatting the datetime objects in my csv files, have tried removing the time zone from the POSIXlt object.

I've seen this post How to calculate average of a variable by hour in R that would give me the result I want, but requires separating date and time into two columns. I'm happy to do that but I'd like to know why this is happening so I can avoid it in future and know which method to use for which data.

Thanks very much.

Either your data is on a 12-hour clock but missing AM/PM data or out of order. — alistaire, Mar 20 '17 at 20:45
Agree with alistaire. On my machine, code and data generated e.g. mean = 27.5175, 26.0995 for hour 01, 02 not hour 13, 14. — neilfws, Mar 20 '17 at 21:02
Its the AM/PM. Sorry, I should have caught that myself. Thank you so mcuh, it was driving me mad. — AWB, Mar 20 '17 at 21:55

In R aggregating by hour changes date and time

0 Answers0