1

I found this post which was helpful but I still have a few questions. I am using a df with the following data:

                  date download  
6  2018-07-10 08:57:12     1.47  
7  2018-07-10 08:59:00     0.57
8  2018-07-10 09:01:00     0.16
9  2018-07-10 09:05:11     0.08
10 2018-07-10 09:09:12     0.09
11 2018-07-10 09:13:11     0.14

What I want to do is to be able to separate the data in 15 min, 30 min, and 60 min chunks based on the hour (so the intervals would be 0:00, 0:15, 0:30, 0:45, 0:60 if it was being separated based upon 15 min chunks. Currently I am using this code:

split(df, cut(df$date, breaks = "hour"))

Which in the hour case works exactly how I want and returns the following results:

$`2018-07-10 08:00:00`
                 date download
6 2018-07-10 08:57:12     1.47
7 2018-07-10 08:59:00     0.57

$`2018-07-10 09:00:00`
                  date download
8  2018-07-10 09:01:00     0.16
9  2018-07-10 09:05:11     0.08
10 2018-07-10 09:09:12     0.09

This is exactly the result I am looking for as it separates the data strictly by the hour (08:00-08:59, 09:00-09:45, etc.). However when I do:

split(df, cut(df$date, breaks = "30 min"))

I end up with the following result:

$`2018-07-10 08:57:00`
                  date download
6  2018-07-10 08:57:12     1.47
7  2018-07-10 08:59:00     0.57
8  2018-07-10 09:01:00     0.16
9  2018-07-10 09:05:11     0.08
10 2018-07-10 09:09:12     0.09
11 2018-07-10 09:13:11     0.14

While it groups it by the 30 min (or 15) interval it is starting that interval at the earliest time, in this case 08:57:00, instead of starting on the hour (09:00:00). How can I make it so that when trying to get the 15 and 30 min intervals they start on the hour like the hour interval does?

This post kind of addresses it, but you end up with a count of events within the time frame, and it is also not very elegant/flexible. I was hoping to stick with split and cut because I am making a shiny app and it seems like it would be easier to make interactive (i.e. have a dropdown menu with the intervals which could easily be inserted straight into the split code).

Thank you!

ancr
  • 33
  • 4
  • create a lookup-table with the intervals you desire.. then use data.table's `foverlaps()` to perform an overlap-join. – Wimpel Jul 11 '18 at 16:08

1 Answers1

0

Below starting point (have to fine-tune boundaries).

Idea is to generate the cutpoints: when used on date(time), seq allows specifying a timeframe in days, hours, minutes for the by parameter.

library(lubridate)
date=ymd_hms(
  c("2018-07-10 08:57:12",
    "2018-07-10 08:59:00",
    "2018-07-10 09:01:00",
    "2018-07-10 09:05:11",
    "2018-07-10 09:09:12",
    "2018-07-10 09:13:11"))
download = round(runif(length(date),min=1,max=10))
df <- data.frame(date, download)

from=round(min(df$date),"hour")-hours(1)
to=round(max(df$date),"hour")+hours(1)
breaks=seq(from, to, by="30 min")
split(df,cut(df$date, breaks))

Leading to

$`2018-07-10 08:00:00`
[1] date     download
<0 rows> (or 0-length row.names)

$`2018-07-10 08:30:00`
                 date download
1 2018-07-10 08:57:12        4
2 2018-07-10 08:59:00        7

$`2018-07-10 09:00:00`
                 date download
3 2018-07-10 09:01:00        5
4 2018-07-10 09:05:11        1
5 2018-07-10 09:09:12        3
6 2018-07-10 09:13:11        4

$`2018-07-10 09:30:00`
[1] date     download
<0 rows> (or 0-length row.names)
Eric Lecoutre
  • 1,461
  • 16
  • 25