0

I am working on a transaction data set that reports the time of transaction in hhmmss format. e.g., 204629, 215450 etc.
I would like to derive from the given column a factor variable with levels that indicate certain hours of the day e.g. 12-3 pm, 3-6 pm etc.
I can think of using str_sub function to select hour values from the given variable and convert them to factor. But is there a more efficient method to achieve this?

Nibbles
  • 19
  • 6
  • Read it in as integer and use `cut`, like normal data binning. See the [FAQ on binning data](https://stackoverflow.com/q/5570293/903061) for examples. – Gregor Thomas Apr 02 '20 at 20:04

1 Answers1

0

You can use dplyr::mutate and stringr::str_sub to create the hour column, and then use cut to divide the hour column into your periods.

library(dplyr)
library(stringr)
library(lubridate)

tibble(string = c("215450", "220102", "020129")) %>% 
  mutate(hour = str_sub(string, 1, 2) %>% as.numeric,
         minute = str_sub(string, 3, 4) %>% as.numeric,
         second = str_sub(string, 5, 6) %>% as.numeric,
         time = str_c(hour, minute, second, sep = ":") %>% hms()) %>% 
  mutate(period = cut(hour, breaks = 2, labels = c("period one", "period two")))

# A tibble: 3 x 6
  string  hour minute second time        period    
  <chr>  <dbl>  <dbl>  <dbl> <Period>    <fct>     
1 215450    21     54     50 21H 54M 50S period two
2 220102    22      1      2 22H 1M 2S   period two
3 020129     2      1     29 2H 1M 29S   period one
Conor
  • 131
  • 5