22

I have some data which is formatted in the following way:

time     count 
00:00    17
00:01    62
00:02    41

So I have from 00:00 to 23:59hours and with a counter per minute. I'd like to group the data in intervals of 15 minutes such that:

time           count
00:00-00:15    148   
00:16-00:30    284

I have tried to do it manually but this is exhausting so I am sure there has to be a function or sth to do it easily but I haven't figured out yet how to do it.

I'd really appreciate some help!!

Thank you very much!

adrian1121
  • 904
  • 2
  • 9
  • 21
  • How did you transform your data to POSIXct? I have more so the same data but can't convert them properly. I get `NAs` – Jack Sep 30 '18 at 12:10

3 Answers3

26

For data that's in POSIXct format, you can use the cut function to create 15-minute groupings, and then aggregate by those groups. The code below shows how to do this in base R and with the dplyr and data.table packages.

First, create some fake data:

set.seed(4984)
dat = data.frame(time=seq(as.POSIXct("2016-05-01"), as.POSIXct("2016-05-01") + 60*99, by=60),
                 count=sample(1:50, 100, replace=TRUE))

Base R

cut the data into 15 minute groups:

dat$by15 = cut(dat$time, breaks="15 min")
                   time count                by15
1   2016-05-01 00:00:00    22 2016-05-01 00:00:00
2   2016-05-01 00:01:00    11 2016-05-01 00:00:00
3   2016-05-01 00:02:00    31 2016-05-01 00:00:00
...
98  2016-05-01 01:37:00    20 2016-05-01 01:30:00
99  2016-05-01 01:38:00    29 2016-05-01 01:30:00
100 2016-05-01 01:39:00    37 2016-05-01 01:30:00

Now aggregate by the new grouping column, using sum as the aggregation function:

dat.summary = aggregate(count ~ by15, FUN=sum, data=dat)
                 by15 count
1 2016-05-01 00:00:00   312
2 2016-05-01 00:15:00   395
3 2016-05-01 00:30:00   341
4 2016-05-01 00:45:00   318
5 2016-05-01 01:00:00   349
6 2016-05-01 01:15:00   397
7 2016-05-01 01:30:00   341

dplyr

library(dplyr)

dat.summary = dat %>% group_by(by15=cut(time, "15 min")) %>%
  summarise(count=sum(count))

data.table

library(data.table)

dat.summary = setDT(dat)[ , list(count=sum(count)), by=cut(time, "15 min")]

UPDATE: To answer the comment, for this case the end point of each grouping interval is as.POSIXct(as.character(dat$by15)) + 60*15 - 1. In other words, the endpoint of the grouping interval is 15 minutes minus one second from the start of the interval. We add 60*15 - 1 because POSIXct is denominated in seconds. The as.POSIXct(as.character(...)) is because cut returns a factor and this just converts it back to date-time so that we can do math on it.

If you want the end point to the nearest minute before the next interval (instead of the nearest second), you could to as.POSIXct(as.character(dat$by15)) + 60*14.

If you don't know the break interval, for example, because you chose the number of breaks and let R pick the interval, you could find the number of seconds to add by doing max(unique(diff(as.POSIXct(as.character(dat$by15))))) - 1.

eipi10
  • 91,525
  • 24
  • 209
  • 285
  • 1
    It is a great answer! How would you find the endpoints for each interval (most) efficiently? – Michal Majka Apr 24 '16 at 20:47
  • Perfect answer! Thank you very much! – adrian1121 Apr 24 '16 at 20:58
  • You are saying `For data that is in POSIXct format do...` But the question is not about POSIXct format. how did you change the data to that format? – Jack Sep 30 '18 at 12:09
  • `as.POSIXct` in base R (as described in the answer) or one of the `mdy_hms`, `ymd_hms`, etc. with the `lubridate` package. – eipi10 Oct 01 '18 at 05:15
  • Does you data look like this `2018-01-02 03:04:00"`? Or this `"03:04:00"`? Or some other format? It will be easier to help you if you provide more information about the problem you're having. – eipi10 Oct 01 '18 at 05:17
4

The cut approach is handy but slow with large data frames. The following approach is approximately 1,000x faster than the cut approach (tested with 400k records.)

  #     Function: Truncate (floor) POSIXct to time interval (specified in seconds)
  #       Author: Stephen McDaniel @ PowerTrip Analytics
  #        Date : 2017MAY
  #    Copyright: (C) 2017 by Freakalytics, LLC
  #      License: MIT

  floor_datetime <- function(date_var, floor_seconds = 60, 
        origin = "1970-01-01") { # defaults to minute rounding
     if(!is(date_var, "POSIXct")) stop("Please pass in a POSIXct variable")
     if(is.na(date_var)) return(as.POSIXct(NA)) else {
        return(as.POSIXct(floor(as.numeric(date_var) / 
           (floor_seconds))*(floor_seconds), origin = origin))
     }
  }

Sample output:

test <- data.frame(good = as.POSIXct(Sys.time()), 
   bad1 = as.Date(Sys.time()),
   bad2 = as.POSIXct(NA))

test$good_15 <- floor_datetime(test$good, 15 * 60)
test$bad1_15 <- floor_datetime(test$bad1, 15 * 60)
Error in floor_datetime(test$bad, 15 * 60) : 
  Please pass in a POSIXct variable
test$bad2_15 <- floor_datetime(test$bad2, 15 * 60)

test

                        good       bad1 bad2             good_15 bad2_15
    1 2017-05-06 13:55:34.48 2017-05-06 <NA> 2007-05-06 13:45:00    <NA>
1

You can do it in one line by using trs function from FQOAT, just like:

df_15mins=trs(df, "15 mins")

Below is a repeatable example:

library(foqat)
head(aqi[,c(1,2)])
#            Time        NO
#1 2017-05-01 01:00:00 0.0376578
#2 2017-05-01 01:01:00 0.0341483
#3 2017-05-01 01:02:00 0.0310285
#4 2017-05-01 01:03:00 0.0357016
#5 2017-05-01 01:04:00 0.0337507
#6 2017-05-01 01:05:00 0.0238120

#mean
aqi_15mins=trs(aqi[,c(1,2)], "15 mins")
head(aqi_15mins)
#             Time         NO
#1 2017-05-01 01:00:00 0.02736549
#2 2017-05-01 01:15:00 0.03244958
#3 2017-05-01 01:30:00 0.03743626
#4 2017-05-01 01:45:00 0.02769419
#5 2017-05-01 02:00:00 0.02901817
#6 2017-05-01 02:15:00 0.03439455
TichPi
  • 146
  • 5