I have a data frame containg two columns, an identifier and a date. The code below creates a sample data frame.
x <- c(rep(c("a","b"), each=10), rep(c("c", "d"), each=5))
y <- c(seq(as.Date("2014-01-01"), as.Date("2014-01-05"), by = 1),
as.Date("2014-03-12"),
as.Date("2014-03-15"),
seq(as.Date("2014-05-11"), as.Date("2014-05-13"), by = 1),
seq(as.Date("2014-06-11"), as.Date("2014-06-14"), by = 1),
seq(as.Date("2014-06-01"), as.Date("2014-06-20"), by = 2),
seq(as.Date("2014-07-31"), as.Date("2014-08-05"), by = 1))
df <- data.frame(x = x, y = y)
The following is the output of df
.
x y
1 a 2014-01-01
2 a 2014-01-02
3 a 2014-01-03
4 a 2014-01-04
5 a 2014-01-05
6 a 2014-03-12
7 a 2014-03-15
8 a 2014-05-11
.
.
.
23 c 2014-06-17
24 c 2014-06-19
25 c 2014-07-31
26 d 2014-08-01
27 d 2014-08-02
28 d 2014-08-03
29 d 2014-08-04
30 d 2014-08-05
I would like to create another data frame that summarises the date ranges; i.e. for each x an entry will be created for each contiguous set of dates. The output I would like (based on the data in df) is the following:
x start.rng end.rng days.rng
a 2014-01-01 2014-01-05 5
a 2014-03-12 2014-03-12 1
a 2014-03-15 2014-03-15 1
a 2014-05-11 2014-05-13 3
b 2014-06-11 2014-06-14 4
b 2014-06-01 2014-06-01 1
b 2014-06-03 2014-06-03 1
b 2014-06-05 2014-06-05 1
b 2014-06-07 2014-06-07 1
b 2014-06-09 2014-06-09 1
b 2014-06-11 2014-06-11 1
c 2014-06-13 2014-06-13 1
c 2014-06-15 2014-06-15 1
c 2014-06-17 2014-06-17 1
c 2014-06-19 2014-06-19 1
c 2014-07-31 2014-07-31 1
d 2014-08-01 2014-08-05 5
I am unable to figure out how to go about this.
Thank you