Here's a tidyverse way to do that that is not a range-based join, but it has some limitations:
- it tallies on the known visits, so if you want specific time ranges (i.e., starting at 6am) then you'll need to bring those in externally with a value of
0
- it uses
lubridate::floor_date
and truncates all entry/exit times to the previous 15-minute interval; I think this will suffice for your needs, but do some checking to make sure all customers are accounted for
library(dplyr)
library(purrr) # map2
library(lubridate) # floor_date
tbl %>%
mutate(times = purrr::map2(entry_time, exit_time, ~ seq(floor_date(.x, unit="15 mins"), floor_date(.y, unit="15 mins"), by="15 mins"))) %>%
select(times) %>%
unnest(times) %>%
group_by(times) %>%
tally()
# # A tibble: 64 x 2
# times n
# <dttm> <int>
# 1 2021-11-09 07:30:00 1
# 2 2021-11-09 07:45:00 2
# 3 2021-11-09 08:00:00 2
# 4 2021-11-09 08:15:00 2
# 5 2021-11-09 08:30:00 2
# 6 2021-11-09 08:45:00 1
# 7 2021-11-09 09:00:00 1
# 8 2021-11-09 09:15:00 1
# 9 2021-11-09 09:30:00 2
# 10 2021-11-09 09:45:00 2
# # ... with 54 more rows
This can also be done with a range-based join. dplyr
doesn't support that directly, and ways to do those joins in dplyr
tend to be inefficient; I recommend sqldf
here, though data.table
and fuzzyjoin
solutions also exist (see Merging two data frames by time range in R).
Reproducible data:
library(dplyr)
library(lubridate)
set.seed(42)
entry_time <- ymd_hms("2021/11/9 6:00:00") + runif(n=20, min=0, max = 14 * 3600)
exit_time <- entry_time + runif(n=20, min=0, max = 5 * 3600)
tbl <- tibble(entry_time, exit_time) %>%
arrange(entry_time) %>%
mutate(customerID = as.character(1:20))
tbl
# # A tibble: 20 x 3
# entry_time exit_time customerID
# <dttm> <dttm> <chr>
# 1 2021-11-09 07:38:41 2021-11-09 08:40:59 1
# 2 2021-11-09 07:53:07 2021-11-09 12:24:50 2
# 3 2021-11-09 09:34:33 2021-11-09 13:00:06 3
# 4 2021-11-09 10:00:21 2021-11-09 14:57:01 4
# 5 2021-11-09 12:24:30 2021-11-09 16:05:46 5
# 6 2021-11-09 12:28:19 2021-11-09 12:29:30 6
# 7 2021-11-09 12:38:59 2021-11-09 17:10:58 7
# 8 2021-11-09 13:16:02 2021-11-09 15:50:18 8
# 9 2021-11-09 13:50:40 2021-11-09 16:54:12 9
# 10 2021-11-09 14:59:03 2021-11-09 15:23:47 10
# 11 2021-11-09 15:11:52 2021-11-09 17:25:57 11
# 12 2021-11-09 15:52:15 2021-11-09 20:03:03 12
# 13 2021-11-09 16:04:03 2021-11-09 20:07:22 13
# 14 2021-11-09 16:18:44 2021-11-09 18:15:47 14
# 15 2021-11-09 17:37:34 2021-11-09 22:21:34 15
# 16 2021-11-09 18:48:26 2021-11-09 23:19:38 16
# 17 2021-11-09 19:05:07 2021-11-09 21:01:33 17
# 18 2021-11-09 19:07:08 2021-11-09 19:48:45 18
# 19 2021-11-09 19:09:36 2021-11-09 23:19:29 19
# 20 2021-11-09 19:41:42 2021-11-09 19:43:54 20
We'll start the output frame's with start and end times (I went just past the last customer, you can control this to "10pm" with literal code).
times <- seq(ymd_hms("2021/11/9 6:00:00"), max(tbl$exit_time) + 30*60, by="15 mins")
times <- tibble(start_time = times[-length(times)], end_time = times[-1])
times
# # A tibble: 71 x 2
# start_time end_time
# <dttm> <dttm>
# 1 2021-11-09 06:00:00 2021-11-09 06:15:00
# 2 2021-11-09 06:15:00 2021-11-09 06:30:00
# 3 2021-11-09 06:30:00 2021-11-09 06:45:00
# 4 2021-11-09 06:45:00 2021-11-09 07:00:00
# 5 2021-11-09 07:00:00 2021-11-09 07:15:00
# 6 2021-11-09 07:15:00 2021-11-09 07:30:00
# 7 2021-11-09 07:30:00 2021-11-09 07:45:00
# 8 2021-11-09 07:45:00 2021-11-09 08:00:00
# 9 2021-11-09 08:00:00 2021-11-09 08:15:00
# 10 2021-11-09 08:15:00 2021-11-09 08:30:00
# # ... with 61 more rows
This is the solution and its (truncated) output:
sqldf::sqldf("
select t.start_time, t.end_time, count(*) as n_customers
from times t
left join tbl on (
tbl.entry_time between t.start_time and t.end_time
or tbl.exit_time between t.start_time and t.end_time
or (tbl.entry_time < t.start_time and tbl.exit_time > t.end_time)
)
where tbl.customerid is not null
group by t.start_time, t.end_time") %>%
mutate(across(c(start_time, end_time), ~ `attr<-`(., "tzone", "UTC"))) %>%
head(.)
# start_time end_time n_customers
# 1 2021-11-09 07:30:00 2021-11-09 07:45:00 1
# 2 2021-11-09 07:45:00 2021-11-09 08:00:00 2
# 3 2021-11-09 08:00:00 2021-11-09 08:15:00 2
# 4 2021-11-09 08:15:00 2021-11-09 08:30:00 2
# 5 2021-11-09 08:30:00 2021-11-09 08:45:00 2
# 6 2021-11-09 08:45:00 2021-11-09 09:00:00 1
# ...
(Admittedly the TZ is modified here, but that's not too hard to reset.)