Using df
, I am creating a new data frame (final.df
) that has a row for every date between the startdate
and enddate
from the df
datadframe.
df <- data.frame(claimid = c("123A",
"125B",
"151C",
"124A",
"325C"),
startdate = as.Date(c("2018-01-01",
"2017-05-20",
"2017-12-15",
"2017-11-05",
"2018-02-06")),
enddate = as.Date(c("2018-01-06",
"2017-06-21",
"2018-01-02",
"2017-11-15",
"2018-02-18")))
The nested functions below are what I'm using to currently create final.df
, but when looping over hundreds of thousands of claims, this method of creating final.df
takes hours to run. I'm looking for alternatives that will result in the creation of final.df
more efficiently.
claim_level <- function(a) {
specific_row <- df[a, ]
dates <- seq(specific_row$startdate, specific_row$enddate, by="days")
day_level <- function(b) {
day <- dates[b]
data.frame(claimid = specific_row$claimid, date = day)
}
do.call("rbind", lapply(c(1:length(dates)), function(b) day_level(b)))
}
final.df <- do.call("rbind", lapply(c(1:nrow(df)), function(a) claim_level(a)))
print(subset(final.df, claimid == "123A"))
#claimid date
#123A 2018-01-01
#123A 2018-01-02
#123A 2018-01-03
#123A 2018-01-04
#123A 2018-01-05
#123A 2018-01-06