2

I am trying to estimate a series of ARIMA models in a loop, passing in a different dependent variable each iteration from from a list of dependent variables using the. I am trying to use the fable package to do this in R. But I can't seem to pass different variable names from the list into the dplyr pipe.

I have a tsibble that looks something like this:

# A tsibble: 320 x 5 [1Q]
# Key:       age, sex [4]
   quarter   age   sex     var var_log
   <yearqtr> <fct> <fct> <dbl>   <dbl>
 1 1990 Q1   18-25 male   50      3.91
 2 1990 Q2   18-25 male   49.9    3.91
 3 1990 Q3   18-25 male   51.1    3.93
 4 1990 Q4   18-25 male   52.6    3.96
 5 1991 Q1   18-25 male   52.1    3.95
 6 1991 Q2   18-25 male   51.4    3.94
 7 1991 Q3   18-25 male   52.0    3.95
 8 1991 Q4   18-25 male   51.2    3.94
 9 1992 Q1   18-25 male   50.8    3.93
10 1992 Q2   18-25 male   51.7    3.95
# ... with 310 more rows

This data was generated with the following code:

library(zoo)

set.seed(42)

quarter <- as.yearqtr(seq(as.Date("1990-01-01"), by="quarter", length.out = 80), format = "%Y-%m-%d")
age <- c('18-25', 'Over 25')
sex <- c('male', 'female')

df <- expand.grid(quarter, age, sex)
names(df) <- c('quarter', 'age', 'sex')
df$var <- NA

df[df$age=='18-25' & df$sex== 'male', ]$var <- cumsum(c(50, rnorm(n=nrow(df[df$age=='18-25' & df$sex== 'male', ])-1, mean =.1)))
df[df$age=='18-25' & df$sex== 'female', ]$var <- cumsum(c(60, rnorm(n=nrow(df[df$age=='18-25' & df$sex== 'female', ])-1, mean =.2)))
df[df$age=='Over 25' & df$sex== 'male', ]$var <- cumsum(c(50, rnorm(n=nrow(df[df$age=='Over 25' & df$sex== 'male', ])-1, mean = (-.1))))
df[df$age=='Over 25' & df$sex== 'female', ]$var <- cumsum(c(60, rnorm(n=nrow(df[df$age=='Over 25' & df$sex== 'male', ])-1, mean = (-.2))))

df$var_log <- log(df$var)

df <- as_tsibble(df, index=quarter, key=c('age', 'sex'))

I'm attempting to write a function that takes a list of function specifications as its inputs and loops through the functions to repeatedly estimate models, something that looks like this:

select <- dplyr::select

estimate_models <- 
  function(mdl_list, # A list of a list of model specifications
  {
    # This function is a single loop for estimating models. 
    for (i in 1:length(mdl_list)) {
      # Extract model information -----------------------------------------------

      mdl_name <- mdl_list[[i]][["mdl"]] # Name of model
      mdl_type <- sub("_.*","",mdl_name) # Type of model,
      mdl_vars_ari <- mdl_list[[i]][["ari"]] # Contains the dependent variables in ARIMA models


        # ARIMA model estimation --------------------------------------------------

        print(paste0("Estimating ", mdl_name, "..."))
        # Estimate the ARIMA model
        mdl_vars_ari_enquo <- enquo(mdl_vars_ari)
        mdl <- mdl_data %>%
          model(arima = ARIMA(!!mdl_vars_ari_enquo)) %>%
          forecast(h=28) %>% # Forecast 28 periods ahead
          fortify() %>% # Extracts the forecast as a dataframe
          filter(.level==95) %>% # Filter results where the confidence level is 95%
          mutate(mcv_fnl = exp(!!mdl_vars_ari_enquo), quarter = as.Date(quarter, format="%y%m%d")) %>% # Take the exponent and set type of the quarter column to 'Date'
          select(-contains(".")) %>% # Remove extra columns
          rbind(mdl_data) # Rbind fitted values to the model data
      }
    }

The mdl_list, containing the specification details looks something like this:

mdl_list <- list(list(mdl = "arima_model", ari = "var_log", d = "df"))

When trying to run the code I get the following error:

model(mdl_data, arima=ARIMA(!!mdl_vars_ari_enquo))
Warning: 4 errors (1 unique) encountered for arima
[4] Could not find an appropriate ARIMA model.

This seems to be related to the way that the variable name argument in ARIMA(!!mdl_vars_ari_enquo) is being parsed. Passing in var_log works fine. But passing in mdl_vars_ari doesn't work, I assume because of dplyr's non-standard evaluation.

I have read Hadley Wickham's guide here: https://dplyr.tidyverse.org/articles/programming.html but neither quo(), nor enquo() seem to do the trick. I have also tried as.name() but to no avail.

Let me know if you need more details to answer my question.

  • Welcome to SO! *Let me know if you need more details to answer my question.* - Yes! We need data to play with. Your data is likely to big to post it here, so please create some dummy data for us. Have a look at this thread, https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example especially the bit how to make data. This is going to take you a bit of time to make data which represents your data more or less - but this is making it much more likely that we will help you. – tjebo Jan 03 '20 at 10:02
  • Thanks Tjebo! I've updated my question to include some dummy data to play with. – W. A. Birdthistle Jan 06 '20 at 01:44

0 Answers0