Case:
Given are a dataframe df
with (among others) a startDate
and an endDate
column. My goal is to "transform" df
to a new dataframe df_res
that contains one row for every month occuring in the interval (startDate, endDate)
, with additional columns year
, month
and numberOfDaysInMonth
, all of type int.
Example:
Input: df
id startDate endDate someOtherCol
1 2017-09-23 2018-02-01 val1
2 2018-01-01 2018-03-31 val2
... ... ... ...
Desired output: df_res
id year month numberOfDaysInMonth someOtherCol
1 2017 9 8 val1
1 2017 10 31 val1
1 2017 11 30 val1
1 2017 12 31 val1
1 2018 1 31 val1
1 2018 2 1 val1
2 2018 1 31 val2
2 2018 2 28 val2
2 2018 3 31 val2
... ... ... ... ...
Background:
I am relatively new to R but aware of the great dplyr
and lubridate
packages. I just did not manage to achieve the above in a neat way, even when using those packages. The closest I got was this: Expand rows by date range using start and end date, but that does not yield the number of days per month contained in the range.
Any help is greatly appreciated.