I have a dataset with ~ 330 000 rows. Each observation represents a period where an individual recieved a welfare benefit called "care allowance". The benefit is meant to replace income when the recipient has to be absent from work in order to care for their child full-time due to serious illness or to accompany them to a specialist healthcare institution.
There was a change in legislation regarding the welfare benefit in 2017, and one of my research questions concerns changes in the size and composition of the recipient population. My dataset contains information regarding regarding each case of benefit reception from Jan 1st 2016 to Dec 31 2021.
I want to portray the development in the amount of work days that have been compensated by the care allowance scheme over time. In many cases a period of care allowance reception can span years. I want to count the number of business days (e.g monday through friday) in the interval from the start date and end date of the reception period that falls within each of the years from 2016 to 2021.
I am only able to get the count of ordinary days for each year. I would be very appreciative of suggestions on how to modify my code so that df$bdays == df$days
and the vars(days16:days21)
count the number of business days instead.
Update
@Marcus' suggestion works well enough on a small dataset, but takes an unwieldy amount of time to execute on my larger dataset (over an hour and a half). I've come up with a solution using purrr::map2_dbl()
Original code:
library(bizdays)
library(lubridate)
library(dplyr)
id <- sort(sample(1:100, 1000, replace = T))
start_date <- sample(seq(ymd("2016-01-01"), ymd("2021-12-30"), by="day"), 1000)
end_date <- sample(seq(ymd("2016-01-01"), ymd("2021-12-31"), by="day"), 1000)
df <- data.frame(id, start_date, end_date) %>%
filter(end_date > start_date) %>%
mutate(interval = interval(start = start_date, end = end_date))
df <- df %>%
mutate(days16 = as.period(intersect(interval, interval(ymd("2016-01-01"), ymd("2016-12-31"))))%/%days(1),
days17 = as.period(intersect(interval, interval(ymd("2017-01-01"), ymd("2017-12-31"))))%/%days(1),
days18 = as.period(intersect(interval, interval(ymd("2018-01-01"), ymd("2018-12-31"))))%/%days(1),
days19 = as.period(intersect(interval, interval(ymd("2019-01-01"), ymd("2019-12-31"))))%/%days(1),
days20 = as.period(intersect(interval, interval(ymd("2020-01-01"), ymd("2020-12-31"))))%/%days(1),
days21 = as.period(intersect(interval, interval(ymd("2021-01-01"), ymd("2021-12-31"))))%/%days(1))
df[is.na(df)] <- 0
cal <- create.calendar(name = "mycal", weekdays=c("saturday", "sunday"))
df <- df %>%
mutate(days = days16 + days17 + days18 + days19 + days20 + days21) %>%
mutate(bdays = bizdays(start_date, end_date, cal)) %>%
arrange(id, start_date)
head(df, n = 10)
#> id start_date end_date interval days16 days17 days18
#> 1 1 2016-03-15 2017-04-20 2016-03-15 UTC--2017-04-20 UTC 289 110 0
#> 2 1 2016-07-10 2018-12-14 2016-07-10 UTC--2018-12-14 UTC 173 364 347
#> 3 1 2018-03-06 2021-01-11 2018-03-06 UTC--2021-01-11 UTC 0 0 298
#> 4 1 2018-09-01 2019-04-21 2018-09-01 UTC--2019-04-21 UTC 0 0 121
#> 5 2 2016-04-27 2019-04-28 2016-04-27 UTC--2019-04-28 UTC 247 364 364
#> 6 2 2016-08-13 2019-09-10 2016-08-13 UTC--2019-09-10 UTC 139 364 364
#> 7 2 2016-10-03 2017-10-05 2016-10-03 UTC--2017-10-05 UTC 88 277 0
#> 8 2 2018-05-12 2018-07-17 2018-05-12 UTC--2018-07-17 UTC 0 0 65
#> 9 2 2019-08-29 2021-10-11 2019-08-29 UTC--2021-10-11 UTC 0 0 0
#> 10 2 2019-10-08 2020-08-05 2019-10-08 UTC--2020-08-05 UTC 0 0 0
#> days19 days20 days21 days bdays
#> 1 0 0 0 399 287
#> 2 0 0 0 884 634
#> 3 364 364 10 1036 744
#> 4 111 0 0 232 164
#> 5 118 0 0 1093 782
#> 6 252 0 0 1119 801
#> 7 0 0 0 365 263
#> 8 0 0 0 65 46
#> 9 123 364 283 770 552
#> 10 83 217 0 300 216
Created on 2022-09-30 by the reprex package (v2.0.1)