Suppose that I have the following dataset:
library(data.table)
library(lubridate)
store_DT <- data.table(date = seq.Date(from = as.Date("2019-10-01"),
to = as.Date("2019-10-05"),
by = "day"),
store = c(rep("A",5),rep("B",5)))
date store
1: 2019-10-01 A
2: 2019-10-02 A
3: 2019-10-03 A
4: 2019-10-04 A
5: 2019-10-05 A
6: 2019-10-01 B
7: 2019-10-02 B
8: 2019-10-03 B
9: 2019-10-04 B
10: 2019-10-05 B
which is simply a data.table of store x date observations.
Suppose I have another data.table of employee start and end times (inclusive):
roster_DT <- data.table(
store = c("A", "A", "A", "A", "B", "B","B", "B"),
employee_ID = 1:8,
start_date = c("2019-09-30", "2019-10-02", "2019-10-03", "2019-10-04",
"2019-09-30", "2019-10-02", "2019-10-03", "2019-10-04"),
end_date = c("2019-10-04", "2019-10-04", "2019-10-05", "2019-10-06",
"2019-10-04", "2019-10-04", "2019-10-05", "2019-10-06")
)
store employee_ID start_date end_date
1: A 1 2019-09-30 2019-10-04
2: A 2 2019-10-02 2019-10-04
3: A 3 2019-10-03 2019-10-05
4: A 4 2019-10-04 2019-10-06
5: B 5 2019-09-30 2019-10-04
6: B 6 2019-10-02 2019-10-04
7: B 7 2019-10-03 2019-10-05
8: B 8 2019-10-04 2019-10-06
What I want to do is simply count the number of employees that each store has on any given date, and bring this back to store_DT
. The complication here is that roster_DT
specifies a range of dates. Now, one solution is to simply expand roster_DT
using the advice here. But the actual dataset is quite large, and expanding is not efficient/feasible. So I was wondering if there were any other approaches.
The finalized dataset I am looking for is:
date store employees
1: 2019-10-01 A 1
2: 2019-10-02 A 2
3: 2019-10-03 A 3
4: 2019-10-04 A 4
5: 2019-10-05 A 2
6: 2019-10-01 B 1
7: 2019-10-02 B 2
8: 2019-10-03 B 3
9: 2019-10-04 B 4
10: 2019-10-05 B 2
There are many many stores, and many many employees in my dataset, so I am hoping for a data.table solution.
Thank you so much!