There are 10 projects split between group A & B, each with different start and end dates. For each day within a given period the sum of outputX and outputY needs to be calculated. I manage to do this for all projects together, but how to split the results per group?
I've made several attempts with lapply() and purrr:map(), also looking at filters and splits, but to no avail. An example that doesn't distinguish between groups is found below.
library(tidyverse)
library(lubridate)
df <- data.frame(
project = 1:10,
group = c("A","B"),
outputX = rnorm(2),
outputY = rnorm(5),
start_date = sample(seq(as.Date('2018-01-3'), as.Date('2018-1-13'), by="day"), 10),
end_date = sample(seq(as.Date('2018-01-13'), as.Date('2018-01-31'), by="day"), 10))
df$interval <- interval(df$start_date, df$end_date)
period <- data.frame(date = seq(as.Date("2018-01-08"), as.Date("2018-01-17"), by = 1))
df_sum <- do.call(rbind, lapply(period$date, function(x){
index <- x %within% df$interval;
list("X" = sum(df$outputX[index]),
"Y" = sum(df$outputY[index]))}))
outcome <- cbind(period, df_sum) %>% gather("id", "value", 2:3)
outcome
Ultimately, it should be a 40x4 table. Some suggestions are much appreciated!