3

I tried to use the varies answers from Expanding a sequence in a data frame to my dataframe, but nothing I tried works.

Sample Data

library(dplyr)    
p1 <- c(1:5)
p2 <- as.Date(c("2013-01-01","2013-01-22","2014-02-01","2014-05-12","2015-02-22"))
p3 <- as.Date(c("2013-01-11","2013-01-30","2014-02-20","2014-05-22","2015-02-28"))
p4 <- c(11,9,20,11,7)
df2 <- data_frame(p1,p2,p3,p4)
names(df2) <- c("ID", "StartDate", "EndDate", "NoDays")
df2

Desired Result

ID  datelist    NoDays
1   2013-01-01   1
1   2013-01-02   1 
1   2013-01-03   1
etc..
1   2013-01-10   1
1   2013-01-11   1
2   2013-01-22   1
2   2013-01-23   1
etc.
2   2013-01-28   1
2   2013-01-29   1
2   2013-01-30   1

Here are three code trials - all of which I tried in numerous variants (e.g. verious members of the apply family), but all failed (i.e. giving diverse error messages):

code example 1

datelist <- seq.Date(from = df2$StartDate, to=df2$StartDate, by="days")

code example 2

datelist <- seq.Date(from = df2$StartDate, by="days", length.out = df2$NoDays)

code example 2

datelist <- apply(df2, 1, seq.Date(from = df2$StartDate, to=df2$StartDate, by="days"))
Community
  • 1
  • 1
user2006697
  • 1,107
  • 2
  • 11
  • 25

2 Answers2

4

Your problem is that you give a vector to seq.Date which takes a unique value as from or to.

On the same idea as your apply call it should be:

apply(df2,1,function(x) { seq.Date( as.Date(x['StartDate']), as.Date(x['EndDate']), by='days') } )

Which gives you a list with each row sequence:

[[1]]
 [1] "2013-01-01" "2013-01-02" "2013-01-03" "2013-01-04" "2013-01-05" "2013-01-06" "2013-01-07" "2013-01-08" "2013-01-09"
[10] "2013-01-10" "2013-01-11"

[[2]]
[1] "2013-01-22" "2013-01-23" "2013-01-24" "2013-01-25" "2013-01-26" "2013-01-27" "2013-01-28" "2013-01-29" "2013-01-30"

[[3]]
 [1] "2014-02-01" "2014-02-02" "2014-02-03" "2014-02-04" "2014-02-05" "2014-02-06" "2014-02-07" "2014-02-08" "2014-02-09"
[10] "2014-02-10" "2014-02-11" "2014-02-12" "2014-02-13" "2014-02-14" "2014-02-15" "2014-02-16" "2014-02-17" "2014-02-18"
[19] "2014-02-19" "2014-02-20"

[[4]]
 [1] "2014-05-12" "2014-05-13" "2014-05-14" "2014-05-15" "2014-05-16" "2014-05-17" "2014-05-18" "2014-05-19" "2014-05-20"
[10] "2014-05-21" "2014-05-22"

[[5]]
[1] "2015-02-22" "2015-02-23" "2015-02-24" "2015-02-25" "2015-02-26" "2015-02-27" "2015-02-28"

To get your desired output, we should return id and NoDays columns too.

In base R I would do it like this:

getDfForDates <- function(row) {
  dseq <- seq.Date( as.Date(row['StartDate']), as.Date(row['EndDate']), by='days')
  data.frame( ID=row['ID'], datelist=dseq, NoDays=1)
}

rbindlist(
  apply(df2,1,function(x) { 
    getDfForDates(x)
  } )
)

Another solution with data.table package is:

setDT(df2)
df2[, list(datelist=seq.Date( StartDate, EndDate, by='days'), NoDays=1), by=ID]

Both will give desired result if I didn't miss a point.

I'll see if I can craft a correct dplyr answer as you seems to be using this package. Finally found a dupe when looking for dplyr examples, voted to close.

Tensibai
  • 15,557
  • 1
  • 37
  • 57
  • Hi, thanks for the answer. I use dplyr because I know it best, but i am always open to learn new packages. Thanks a lot again for your help! – user2006697 Feb 03 '16 at 09:52
  • @user2006697 You're welcome, see akrun answer and the duplicate for dplyr syntax. He's far better than me :p – Tensibai Feb 03 '16 at 10:08
  • 2
    @Tensibai I think you know a lot more about other languages. So, I might be lagging behind you. – akrun Feb 03 '16 at 10:14
4

We can do this easily with data.table. Convert the 'data.frame' to 'data.table' (setDT(df2), if the 'ID's are unique, then group by 'ID', and get the seq of 'StartDate' to 'EndDate' by 'ID'.

library(data.table)
res <- setDT(df2)[,list(datelist=seq(StartDate, EndDate, by='1 day'), 
                      NoDays = 1) , by = ID]

head(res)
#   ID   datelist NoDays
#1:  1 2013-01-01      1
#2:  1 2013-01-02      1
#3:  1 2013-01-03      1
#4:  1 2013-01-04      1
#5:  1 2013-01-05      1
#6:  1 2013-01-06      1

If we need to do this in dplyr, we may need the do as mutate doesn't support this kind of operations

library(dplyr)
df2 %>% 
   rowwise() %>%
   do(data.frame(ID= .$ID, datelist = seq(.$StartDate, 
             .$EndDate, by = '1 day'), NoDays= 1))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Hi, thanks a lot!. What if NoDays cannot be fixed but has to be calculated? E.g. like "NoDays / datdiff(StartDate,EndDate)"? Can this be done with the data.table approach? – user2006697 Feb 03 '16 at 09:51
  • @user2006697 Yes, it can be calculated. Just do the difference of `StartDate` and 'EndDate` by the 'ID' – akrun Feb 03 '16 at 09:54
  • 1
    Super - worked at once. Thanks million! – user2006697 Feb 03 '16 at 10:09