I have a data frame where each row has a unique ID. I need to replicate each one of these rows based on the number of days between the start date and the max of the end date and the approval date.
ID <- c(1,2)
Value <- c(10,20)
StartDate <- c(as.Date("01/01/2015", '%d/%m/%Y'),
as.Date("01/01/2015", '%d/%m/%Y'))
EndDate <- c(as.Date("31/01/2015", '%d/%m/%Y'),
as.Date("15/01/2015", '%d/%m/%Y'))
AppDate <- c(as.Date("15/01/2015", '%d/%m/%Y'),
as.Date("15/02/2015", '%d/%m/%Y'))
df <- data.frame(ID, Value, StartDate, EndDate, AppDate)
df <- df[rep(row.names(df), ifelse(as.numeric(df$AppDate) >
as.numeric(df$EndDate),as.numeric(df$AppDate-df$StartDate),
as.numeric(df$EndDate-df$StartDate)) + 1),]
I then need to add a sequential list of dates from the start date to the max of the end date or approval date.
I've done this via 2 loops. The outer loop loops through the data frame for each unique ID. The second loop then goes through the ID and adds the date. Once the second loop has finished it passes the row to the outer loop as the new start point.
IDs <- unique(df$ID)
df$Days <- rep(as.Date("01/01/1999",'%d/%m/%Y'), nrow(df))
counter <- 1
for (i in 1:length(IDs)) {
ref <- IDs[i]
start <- 1
while (df$ID[counter] == ref) {
ifelse(start == 1, df$Days[counter] <- df$StartDate[counter],
df$Days[counter] <- df$StartDate[counter] + start -1)
ifelse (counter > nrow(df), break, counter <- counter + 1)
ifelse (counter > nrow(df), break, start <- start + 1)
}
}
My actual data set has over 6,000 ID's and once I've replicated the rows it ends up being over 500,000 rows. The loop took over 15 minutes to run so it's obviously very inefficient.
So I guess I have 2 questions.
1). What is the most efficient way to do this in R
2). What would be the most efficient way of doing this in general i.e. in say something like C++
thanks