I have a data set where I am creating new features over various time periods. Since I will be using the following dplyr block repeatedly, I would like to wrap it up in a function, but I do not know how to encode the names of the newly mutated predictors to reflect which time period they refer to in my list of period intervals.
library(dplyr)
library(lubridate)
data <- data.frame(custid = c(1,1,1,2,2,2,3,3,3,4),
total = c(1,2,3,4,5,6,7,8,9,10),
date = as.Date(c("2015-01-01", "2015-01-02",
"2015-01-10", "2015-01-11",
"2015-01-21", "2015-01-22",
"2015-01-24", "2015-01-25",
"2015-01-27", "2015-01-28")))
period_intervals <- list(period_one = interval(as.Date("2015-01-01"), as.Date("2015-01-20")),
period_two = interval(as.Date("2015-01-21"), as.Date("2015-01-30")))
compute_period_predictors <- function(data, time_periods){
### Takes data set and a vector of time periods,
### Adds aggregated predictors for that time period.
for(i in 1:length(time_periods)){
df <- data %>%
filter(date %within% period_intervals[[i]]) %>%
group_by(custid) %>%
mutate(period_i_total_mean = mean(total)) %>%
mutate(period_i_total_sum = sum(total))
}
return(df)
}
Example:
Say I would like to create these two new predictors for time periods period_45, period_50, and period_60. How would I get the mutated variables names to be the concatenated forms period_45_total_mean, period_50_total_mean, etc.?