0

I have a data set where I am creating new features over various time periods. Since I will be using the following dplyr block repeatedly, I would like to wrap it up in a function, but I do not know how to encode the names of the newly mutated predictors to reflect which time period they refer to in my list of period intervals.

library(dplyr)
library(lubridate)

data <- data.frame(custid = c(1,1,1,2,2,2,3,3,3,4),
                   total = c(1,2,3,4,5,6,7,8,9,10),
                   date = as.Date(c("2015-01-01", "2015-01-02", 
                                    "2015-01-10", "2015-01-11", 
                                    "2015-01-21", "2015-01-22", 
                                    "2015-01-24", "2015-01-25", 
                                    "2015-01-27", "2015-01-28")))

period_intervals <- list(period_one = interval(as.Date("2015-01-01"), as.Date("2015-01-20")),
                         period_two = interval(as.Date("2015-01-21"), as.Date("2015-01-30")))


compute_period_predictors <- function(data, time_periods){
  ### Takes data set and a vector of time periods,
  ### Adds aggregated predictors for that time period.

  for(i in 1:length(time_periods)){
    df <- data %>%
      filter(date %within% period_intervals[[i]]) %>%
      group_by(custid) %>%
      mutate(period_i_total_mean = mean(total)) %>%
      mutate(period_i_total_sum = sum(total))
  }

  return(df)

}

Example:

Say I would like to create these two new predictors for time periods period_45, period_50, and period_60. How would I get the mutated variables names to be the concatenated forms period_45_total_mean, period_50_total_mean, etc.?

Todd Young
  • 49
  • 7
  • Your example does not work. However, I guess you like for something like `. %>% mutate_(paste("period", i, "total_sum", sep="_") = sum(total))` – lukeA Oct 10 '16 at 21:29
  • Using `mutate_` I get the following error: `Error: unexpected '=' in: "group_by(custid) %>% mutate_(paste("period", i, "total_sum", sep="_") ="` – Todd Young Oct 10 '16 at 22:26
  • I see. What about `. %>% mutate_(.dots = setNames(list(quote(sum(total))), paste("period", i, "total_sum", sep="_")) `? The [nse vignette](https://cran.r-project.org/web/packages/dplyr/vignettes/nse.html) is a good read. – lukeA Oct 10 '16 at 22:47
  • Using `.dots = setNames(list(quote(sum(total))), ...` gets the formatting right, but as each iteration in the for loop within the function runs, it writes over the previous iteration. Running the function over twelve periods only returns two new variables, the sum() and the mean() for the 12th iteration. – Todd Young Oct 11 '16 at 02:01
  • I'm not sure why, but if I remove the for loop from the function and instead run the loop when I call the function in the main program, it creates all the variables with the proper formatting, but it kills all the data in the data frame. – Todd Young Oct 11 '16 at 02:24

0 Answers0