2

I have a data frame that looks like this:

Name    StartDate   EndDate
A   12/12/2012  12/15/2012
B   11/11/2012  11/14/2012

For each row in the above, I want to generate rows with an additional column called "Date" that has a range between the start and end dates above. So, the data frame above would yield another data frame:

Name    StartDate   EndDate Date
A   12/12/2012  12/15/2012  12/12/2012
A   12/12/2012  12/15/2012  12/13/2012
A   12/12/2012  12/15/2012  12/14/2012
A   12/12/2012  12/15/2012  12/15/2012
B   11/11/2012  11/14/2012  11/11/2012
B   11/11/2012  11/14/2012  11/12/2012
B   11/11/2012  11/14/2012  11/13/2012
B   11/11/2012  11/14/2012  11/14/2012

I found the seq.Date() function that can create the list of dates needed, but I'm not sure how to apply this to each row without using a for loop.

Jaap
  • 81,064
  • 34
  • 182
  • 193
V K
  • 21
  • 3

1 Answers1

4

Using the data.table package you could do:

library(data.table)
setDT(df)[, c('StartDate', 'EndDate') := lapply(.SD, as.Date, format = '%m/%d/%Y'), .SDcols = 2:3
          ][, .(date = seq(StartDate, EndDate, 'day')), by = .(Name, StartDate, EndDate)]

which gives:

   Name  StartDate    EndDate       date
1:    A 2012-12-12 2012-12-15 2012-12-12
2:    A 2012-12-12 2012-12-15 2012-12-13
3:    A 2012-12-12 2012-12-15 2012-12-14
4:    A 2012-12-12 2012-12-15 2012-12-15
5:    B 2012-11-11 2012-11-14 2012-11-11
6:    B 2012-11-11 2012-11-14 2012-11-12
7:    B 2012-11-11 2012-11-14 2012-11-13
8:    B 2012-11-11 2012-11-14 2012-11-14

Explanation:

  • The first line of code converts the date columns to a date-format.
  • Group by all columns (by = .(Name,StartDate,EndDate)) and create a sequence with seq(StartDate,EndDate,'day').
Jaap
  • 81,064
  • 34
  • 182
  • 193