I regularly use the aggregate function to find means and sums of POSIXlt data per hour and per day. I am trying to use the same function on a new dataset to get hourly averages, but when I apply it, it changes the timestamp.
The data is a data.frame (called "moT") like this:
TS T
1 2016-06-26 10:10:34 19.662
2 2016-06-26 10:40:34 21.091
3 2016-06-26 11:10:34 23.388
4 2016-06-26 11:40:34 24.448
5 2016-06-26 12:10:34 25.513
6 2016-06-26 12:40:34 26.390
7 2016-06-26 01:10:34 27.468
8 2016-06-26 01:40:34 27.567
9 2016-06-26 02:10:34 26.977
10 2016-06-26 02:40:34 25.222
11 2016-06-26 03:10:34 23.100
12 2016-06-26 03:40:34 24.158
13 2016-06-26 04:10:34 21.951
14 2016-06-26 04:40:34 21.473
15 2016-06-26 05:10:34 19.948
16 2016-06-26 05:40:34 19.472
17 2016-06-26 06:10:34 18.806
18 2016-06-26 06:40:34 16.808
19 2016-06-26 07:10:34 15.282
20 2016-06-26 07:40:34 14.517
or as per suggested format:
structure(list(TS = structure(list(sec = c(34, 34, 34, 34, 34,
34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34),
min = c(10L, 40L, 10L, 40L, 10L, 40L, 10L, 40L, 10L, 40L,
10L, 40L, 10L, 40L, 10L, 40L, 10L, 40L, 10L, 40L), hour = c(10L,
10L, 11L, 11L, 12L, 12L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L,
5L, 5L, 6L, 6L, 7L, 7L), mday = c(26L, 26L, 26L, 26L, 26L,
26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L,
26L, 26L, 26L), mon = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), year = c(116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L), wday = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), yday = c(177L, 177L, 177L, 177L, 177L, 177L,
177L, 177L, 177L, 177L, 177L, 177L, 177L, 177L, 177L, 177L,
177L, 177L, 177L, 177L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
zone = c("GMT+5", "GMT+5", "GMT+5", "GMT+5", "GMT+5", "GMT+5",
"GMT+5", "GMT+5", "GMT+5", "GMT+5", "GMT+5", "GMT+5", "GMT+5",
"GMT+5", "GMT+5", "GMT+5", "GMT+5", "GMT+5", "GMT+5", "GMT+5"
), gmtoff = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_)), .Names = c("sec", "min", "hour", "mday", "mon",
"year", "wday", "yday", "isdst", "zone", "gmtoff"), class = c("POSIXlt",
"POSIXt"), tzone = "Etc/GMT+5"), T = c(19.662, 21.091, 23.388,
24.448, 25.513, 26.39, 27.468, 27.567, 26.977, 25.222, 23.1,
24.158, 21.951, 21.473, 19.948, 19.472, 18.806, 16.808, 15.282,
14.517)), .Names = c("TS", "T"), row.names = c(NA, 20L), class = "data.frame")
I apply this code to "moT":
dat <- aggregate(moT["T"], format(moT["TS"], "%Y-%m-%d %H"), mean)
I expect this output (for the first five rows):
TS meanT
1 "2016-06-26 10" 20.3765
2 "2016-06-26 11" 23.918
3 "2016-06-26 12" 25.9515
4 "2016-06-26 13" 27.5175
5 "2016-06-26 14" 26.0995
- that is what has happened when I have used the same function on other datasets.
but instead it is this:
TS meanT
1 "2016-01-07 00" 14.5650
2 "2016-01-07 01" 14.0380
3 "2016-01-07 02" 13.6540
4 "2016-01-07 03" 13.6540
5 "2016-01-07 04" 13.7500
Why is the date and time changing???
I have tried using POSIXct instead of POSIXlt, have tried reformatting the datetime objects in my csv files, have tried removing the time zone from the POSIXlt object.
I've seen this post How to calculate average of a variable by hour in R that would give me the result I want, but requires separating date and time into two columns. I'm happy to do that but I'd like to know why this is happening so I can avoid it in future and know which method to use for which data.
Thanks very much.