0

Currently I have code returns each a tibble of events that occur each day using the following:

online_toy_purchases %>%
mutate(interval = lubridate::date(date)) %>%
group_by(interval) %>%
summarise(count = n())

This currently returns the following:

# A tibble: 31 x 2
interval    count
2018-12-01    500
2018-12-02    300
2018-12-03    400
2018-12-04    200
2018-12-05    600
...
2018-12-31    100

I would like my code to group by each hour and each day for a more granular view of the data, which would return the following:

# A tibble: 744  x 2
interval             count
2018-12-01 01:00:00    50    
2018-12-01 02:00:00    60  
2018-12-01 03:00:00    20  
2018-12-01 04:00:00    80  
...
2018-12-31 24:00:00    10 

online_toy_purchases is a tibble that contains, among other features, the ID of the transaction and a timestamp containing the date and the hour, minute and second of the purchase (i.e -> "2018-12-01 01:20:58")

Sepa
  • 111
  • 1
  • 9
  • 3
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Jan 03 '19 at 22:07
  • 1
    What if you just did `group_by(interval, lubridate::hour(date))`? Unable to test because there is no reproducible example. – MrFlick Jan 03 '19 at 22:08
  • This returns a tibble with interval, 'lubridate::hour(date)', count as features, with the middle feature displaying the hours. This is really close to what I want, but wouldn't be suitable for plotting. Working on getting some reproducible data to this post. – Sepa Jan 03 '19 at 22:19

1 Answers1

1

This will count the number of rows within each hour of the data.

library(tidyverse)
online_toy_purchases %>%
  # assuming that "date" is formatted as a datetime variable already
  count(time = lubridate::floor_date(date, "1 hour")) %>%

  # additional step using padr::pad to add missing hours and
  #   tidyr::replace_na to make NAs into zeroes
  padr::pad() %>%
  replace_na(list(n=0))

For visualization and further analysis, it will be helpful to have rows recording periods with no data. You might alternatively accomplish something similar by converting to a tsibble.

Jon Spring
  • 55,165
  • 4
  • 35
  • 53
  • Thanks! This gets me very close. How would I use padr or tsibble to return a "0" value for unmentioned hours? Those exist in this data. – Sepa Jan 03 '19 at 22:25