2

I have the following data.frame:

df <- data.frame(id=c(1,2,3), 
                 first.date=as.Date(c("2014-01-01", "2014-03-01", "2014-06-01")), 
                 second.date=as.Date(c("2015-01-01", "2015-03-01", "2015-06-1")),
                 third.date=as.Date(c("2016-01-01", "2017-03-01", "2018-06-1")),
                 fourth.date=as.Date(c("2017-01-01", "2018-03-01", "2019-06-1")))

> df

  id first.date second.date third.date fourth.date
1  1 2014-01-01  2015-01-01 2016-01-01  2017-01-01
2  2 2014-03-01  2015-03-01 2017-03-01  2018-03-01
3  3 2014-06-01  2015-06-01 2018-06-01  2019-06-01

Each row represents three timespans; i.e. the time spans between first.date and second.date, second.date and third.date, and third.date and fourth.date respectively.

I would like to, in lack of a better word, unnest the dataframe to obtain this instead:

  id  StartDate    EndDate
1  1 2014-01-01 2015-01-01
2  1 2015-01-01 2016-01-01
3  1 2016-01-01 2017-01-01
4  2 2014-03-01 2015-03-01
5  2 2015-03-01 2017-03-01
6  2 2017-03-01 2018-03-01
7  3 2014-06-01 2015-06-01
8  3 2015-06-01 2018-06-01
9  3 2018-06-01 2019-06-01

I have been playing around with the unnest function from the tidyr package, but I came to the conclusion that I don't think it's what I'm really looking for.

Any suggestions?

Morten Nielsen
  • 325
  • 2
  • 4
  • 19

2 Answers2

4

We can use data.table. We convert the 'data.frame' to 'data.table' (setDT(df)), then melt the dataset to long format, use shift with type='lead' grouped by 'id' and then remove the NA elements.

library(data.table)
na.omit(melt(setDT(df), id.var='id')[, shift(value,0:1, type='lead') , id])
#   id         V1         V2
#1:  1 2014-01-01 2015-01-01
#2:  1 2015-01-01 2016-01-01
#3:  1 2016-01-01 2017-01-01
#4:  2 2014-03-01 2015-03-01
#5:  2 2015-03-01 2017-03-01
#6:  2 2017-03-01 2018-03-01
#7:  3 2014-06-01 2015-06-01
#8:  3 2015-06-01 2018-06-01
#9:  3 2018-06-01 2019-06-01

The column names can be changed by using either setnames or earlier in the shift step.

akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Appreciate this suggestion as well. I wasn't able to translate it to use with my actual data, but that has more to do with my R skills than the quality of the suggestion. – Morten Nielsen Jan 07 '16 at 22:17
4

You can try tidyr/dplyr as follows:

library(tidyr)
library(dplyr)
df %>% gather(DateType, StartDate, -id) %>% select(-DateType) %>% arrange(id) %>% group_by(id) %>% mutate(EndDate = lead(StartDate))

You can eliminate the last row in each id group by adding:

%>% slice(-4)

To the above pipeline.

Gopala
  • 10,363
  • 7
  • 45
  • 77