I have two columns with start and end dates of every week. I need to aggregate other column on monthly basis by the mean of the weeks of particular month (I have 3 years in dataset) and create another column that will contain weight for the whole month (so it will be the same value for 5-6 weeks, depending how many weeks particular month have for particular ID (I have thousands of id's in dataset). Tricky part is that some of the weeks are overlapping, so that one row sometimes but be taken into calculation of both months eg. when we have start_date = 2020-07-27 and end_date = 2020-08-09 (It has to be taken both to July and August month). This is my data:
ID | weight | start_date | end_date |
---|---|---|---|
60 | 1,2 | 2019-12-30 | 2020-01-05 |
60 | 1,4 | 2020-01-06 | 2020-01-12 |
60 | 1,3 | 2020-01-13 | 2020-01-19 |
60 | 1,0 | 2020-01-20 | 2020-01-26 |
60 | 3,8 | 2020-01-27 | 2020-02-02 |
61 | 1,7 | 2019-12-30 | 2020-01-05 |
61 | 12,9 | 2020-01-06 | 2020-01-12 |
I want to obtain:
ID | weight | start_date | end_date | Monthy_weight | Month |
---|---|---|---|---|---|
60 | 1,2 | 2020-12-30 | 2020-01-05 | 1,74 | 01.2020 |
60 | 1,4 | 2020-01-06 | 2020-01-12 | 1,74 | 01.2020 |
60 | 1,3 | 2020-01-13 | 2020-01-19 | 1,74 | 01.2020 |
60 | 1,0 | 2020-01-20 | 2020-01-26 | 1,74 | 01.2020 |
60 | 3,8 | 2020-01-27 | 2020-02-02 | 1,74 | 01.2020 |
61 | 1,7 | 2020-12-30 | 2020-01-05 | 7,3 | 01.2020 |
61 | 12,9 | 2020-01-06 | 2020-01-12 | 7,3 | 01.2020 |
Firstly I wanted to do a loop that will detect every month in both columns and if the month appears, it will take the mean from other column, but then I found similar problem on stack overflow (How to convert weekly data into monthly data?) and decided to do it with zoo.
I tried to implement solution from the above post:
library(zoo)
z.st <- read.zoo(long_weights[c("start_date", "weight")])
z.en <- read.zoo(long_weights[c("end_date", "weight")])
z <- c(z.st, z.en)
g <- zoo(, seq(start(z), end(z), "day"))
m <- na.locf(merge(z, g))
aggregate(m, as.yearmon, mean)
but after this line:
z <- c(z.st, z.en)
Im obtaining an error: Error in bind.zoo(...) : indexes overlap
I also tried, but this not takes into consideration overlapping weeks:
df <- df %>% group_by(HHKEY, month = floor_date((as.Date(end_date)- as.Date(start_date))/2 + as.Date(start_date), "month")) %>% mutate(monthly_weight = mean(weight), .after = end_date, month = format(month, "%Y.%m")) %>% ungroup()