3

I have a data_frame with POSIXct date-times. I would now like to create a variable that cuts these date-times into timebands: 1 -- [00:00:00, 08:00:00), 2 -- [08:00:00, 17:00:00), 3 -- [17:00:00, 18:30:00), 4 -- [18:30:00, 00:00:00).

Here is some sample data:

df_times = data_frame(
  datetime = seq.POSIXt(
    from = as.POSIXct(strftime("2016-01-01 00:00:00", format = "%Y-%m-%d :%H:%M:%S")),
    by = "min",
    length.out = 100000
  ),
  value = rnorm(100000)
)

Here is the expected output:

> df_times
# A tibble: 100,000 × 3
              datetime      value  band
                <dttm>      <dbl> <dbl>
1  2016-01-01 00:00:00  0.5855288     1
2  2016-01-01 00:01:00  0.7094660     1
3  2016-01-01 00:02:00 -0.1093033     1
4  2016-01-01 00:03:00 -0.4534972     1
5  2016-01-01 00:04:00  0.6058875     1
6  2016-01-01 00:05:00 -1.8179560     1
7  2016-01-01 00:06:00  0.6300986     1
8  2016-01-01 00:07:00 -0.2761841     1
9  2016-01-01 00:08:00 -0.2841597     1
10 2016-01-01 00:09:00 -0.9193220     1
# ... with 99,990 more rows

I have tried cut.POSIXt but that insists on keeping track of dates. An ideal solution will use dplyr::recode or forcats::.

tchakravarty
  • 10,736
  • 12
  • 72
  • 116

2 Answers2

4

Here is the solution I think directly translates the intent of the question into code:

set.seed(12345)

# create a dataset
df_times = data_frame(
  datetime = seq.POSIXt(
    from = as.POSIXct("2016-01-01 00:00:00", format = "%Y-%m-%d %H:%M:%S"),
    by = "min",
    length.out = 100000
  ),
  value = rnorm(100000)
) %>% 
  mutate(
    time = times(format(datetime, "%H:%M:%S")),
    cut(
      time,
      breaks = times(c(
        "00:00:00", 
        "08:00:00", 
        "17:00:00",
        "18:30:00",
        "23:59:59"
      )),
      labels = c(
        "1",
        "2",
        "3",
        "4"
      ),
      include.lowest = TRUE,
      right = FALSE
    )
  )
tchakravarty
  • 10,736
  • 12
  • 72
  • 116
3

You could create an hour column and then cut that:

df_times$hour = as.numeric(df_times$datetime) %% (24*60*60) / 3600
df_times$band = cut(df_times$hour, breaks=c(0,8,17,18.5,24), include.lowest=TRUE, 
                    right=FALSE)
eipi10
  • 91,525
  • 24
  • 209
  • 285
  • Hi eipi10, thanks for this, but I was ideally looking for a more elegant/readable solution, as I already have a similar solution that works, but is cumbersome to work with. – tchakravarty Oct 03 '16 at 06:17
  • I have posted an answer that I think is more compact and expressive -- comments welcome. – tchakravarty Oct 03 '16 at 06:37