I'm attempting to create a separate table from my original data that contains all of the dates between two dates, represented by separate columns in my original table. I have successfully completed this using a loop, but I am sure there's a more efficient means of doing this. In the example data I have provided, there are only 3 row, but the real data set I'm working with has > 500,000. I can't afford inefficiency.
Example:
df <- data.frame(
id = c('A','B','C'),
fromDate = c('2020-01-01','2020-02-01','2020-03-05'),
toDate = c('2020-01-10','2020-02-03','2020-03-06')
)
#output
------------------------------
id fromDate toDate
---- ------------ ------------
A 2020-01-01 2020-01-10
B 2020-02-01 2020-02-03
C 2020-03-05 2020-03-06
------------------------------
#current solution
results <- data.frame(id = NULL,timespan = NULL)
for(i in 1:nrow(df)){
results <- rbind(
results,
data.frame(id = df$id[i], timespan = seq(as.Date(df$fromDate[i]),as.Date(df$toDate[i]),by = 'days'))
)
}
#results
-----------------
id timespan
---- ------------
A 2020-01-01
A 2020-01-02
A 2020-01-03
A 2020-01-04
A 2020-01-05
A 2020-01-06
A 2020-01-07
A 2020-01-08
A 2020-01-09
A 2020-01-10
B 2020-02-01
B 2020-02-02
B 2020-02-03
C 2020-03-05
C 2020-03-06
-----------------
Any suggestions on how to speed this up for scale?