2

I have data with Order Id, Start Date & End Date. I have to split both the Start and End dates into intervals of 30 days, and derive two new variables “split start date” and “split end date”. Example: The below example illustrates how split dates are created when the Start Date is “01/05/2017” and the End Date is “06/07/2017” Suppose, an order have start and end dates as below

see the image for example

What is the code for this problem in R ?

N8888
  • 670
  • 2
  • 14
  • 20
S.Sur
  • 21
  • 2
  • 2
    When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Pictures of data are not particularly helpful. – MrFlick Jul 04 '18 at 17:16

1 Answers1

0

Here is a solution which should generalize to multiple order id's. I have created a sample data with two order id's. The basic idea is to calculate the number of intervals between start_date and end_date. Then we repeat the row for each order id by the number of intervals, and also create a sequence to determine which interval we are in. This is the purpose of creating functions f and g and the use of Map.

The remaining is just vector manipulations where we define split_start_date and split_end_date. The last statement is to ensure that split_end_date does not exceed end_date.

df <- data.frame(
  order_id = c(1, 2),
  start_date = c(as.Date("2017-05-01"), as.Date("2017-08-01")),
  end_date = c(as.Date("2017-07-06"), as.Date("2017-09-15"))
)
df$diff_days <- as.integer(df$end_date - df$start_date)
df$num_int <- ceiling(df$diff_days / 30)
f <- function(rowindex) {
  rep(rowindex, each = df[rowindex, "num_int"])
}
g <- function(rowindex) {
  1:df[rowindex, "num_int"]
}
rowindex_rep <- unlist(Map(f, 1:nrow(df)))
df2 <- df[rowindex_rep, ]
df2$seq <- unlist(Map(g, 1:nrow(df)))
df3 <- df2
df3$split_start_date <- df3$start_date + (df3$seq - 1) * 30
df3$split_end_date <- df3$split_start_date + 29
df3[which(df3$seq == df3$num_int), ]$split_end_date <-
  df3[which(df3$seq == df3$num_int), ]$end_date
Anindya Mozumdar
  • 483
  • 2
  • 11