-1
library(dplyr)
df <- tibble(id = c("1", "2", "3", "4"),
             start_date = c("2021-01-01", "2021-01-15", "2021-02-03", "2021-05-20"),
             end_date = c("2021-10-11", "2021-08-17", "2021-12-20", "2021-07-01"))
df
#> # A tibble: 4 × 3
#>   id    start_date end_date  
#>   <chr> <chr>      <chr>     
#> 1 1     2021-01-01 2021-10-11
#> 2 2     2021-01-15 2021-08-17
#> 3 3     2021-02-03 2021-12-20
#> 4 4     2021-05-20 2021-07-01

What I want to get? Subtract end_date - start_date and fill rows with year-month data:

id   start_date   year_month 
1    2021-01      2021-01
1    2021-01      2021-02
............................
1    2021-01      2021-10
2    2021-01      2021-01
............................
2    2021-01      2021-07
2    2021-01      2021-08
3    2021-02      2021-02
3    2021-02      2021-03
............................
3    2021-02      2021-11
3    2021-02      2021-12
4    2021-05      2021-05
4    2021-05      2021-06
4    2021-05      2021-07

How do I add rows to extend my grouped data?

Sasha Poda
  • 105
  • 1
  • 7
  • 1
    Does this answer your question? [Expand rows by date range using start and end date](https://stackoverflow.com/questions/24803361/expand-rows-by-date-range-using-start-and-end-date) – caldwellst Nov 22 '21 at 13:45

1 Answers1

0
library(data.table)
library(lubridate)    
setDT(df)
# Create real dates
df[, start_date := as.Date(start_date)]
df[, end_date := as.Date(end_date)]
# floor start and end month
df[, start_month := lubridate::floor_date(start_date, unit = "month")]
df[, end_month := lubridate::floor_date(end_date, unit = "month")]
# so far we've got:
#    id start_date   end_date start_month  end_month
# 1:  1 2021-01-01 2021-10-11  2021-01-01 2021-10-01
# 2:  2 2021-01-15 2021-08-17  2021-01-01 2021-08-01
# 3:  3 2021-02-03 2021-12-20  2021-02-01 2021-12-01
# 4:  4 2021-05-20 2021-07-01  2021-05-01 2021-07-01

#create sequence by month by id
df[, .(year_month = format(seq(start_month, end_month, by = "1 month"), "%Y-%m")), 
   by = .(id, start_date = format(start_date, "%Y-%m"))]
#    id start_date year_month
# 1:  1    2021-01    2021-01
# 2:  1    2021-01    2021-02
# 3:  1    2021-01    2021-03
# 4:  1    2021-01    2021-04
# 5:  1    2021-01    2021-05
# 6:  1    2021-01    2021-06
# 7:  1    2021-01    2021-07
# 8:  1    2021-01    2021-08
# 9:  1    2021-01    2021-09
#10:  1    2021-01    2021-10
#11:  2    2021-01    2021-01
#12:  2    2021-01    2021-02
#13:  2    2021-01    2021-03
....
Wimpel
  • 26,031
  • 1
  • 20
  • 37