I have a following data frame:
library(tidyverse)
df <- data_frame(
id = c(1, 1, 2, 2),
date1 = as.Date(c("2013-01-01", "2013-02-01", "2015-04-01", "2015-05-01")),
date2 = as.Date(c("2012-12-09", "2012-12-09", "2015-03-10", "2015-03-10"))
)
# A tibble: 4 x 3
id date1 date2
<dbl> <date> <date>
1 1 2013-01-01 2012-12-09
2 1 2013-02-01 2012-12-09
3 2 2015-04-01 2015-03-10
4 2 2015-05-01 2015-03-10
And I want to complete this data frame such that for each id
, there will be another date1
value. This another date1
value is computed as the next month. Also there is a date2
value which is same for all id
's. With tidyr::complete
this action can be done like this:
df %>%
group_by(id) %>%
complete(date1 = seq.Date(from = min(date1), length.out = 3, by = "month"), date2 = date2[1])
# A tibble: 6 x 3
# Groups: id [2]
id date1 date2
<dbl> <date> <date>
1 1 2013-01-01 2012-12-09
2 1 2013-02-01 2012-12-09
3 1 2013-03-01 2012-12-09
4 2 2015-04-01 2015-03-10
5 2 2015-05-01 2015-03-10
6 2 2015-06-01 2015-03-10
Since I have about 150K groups in my original data, the tidyr
solution is taking more than hour to complete. I am assuming that speed would be gained using data.table
. Can the same thing be done in data.table
?
Similar questions has been asked in data.table equivalent of tidyr::complete() but without group_by
clause.