I am trying to estimate a series of ARIMA models in a loop, passing in a different dependent variable each iteration from from a list of dependent variables using the. I am trying to use the fable
package to do this in R. But I can't seem to pass different variable names from the list into the dplyr pipe.
I have a tsibble that looks something like this:
# A tsibble: 320 x 5 [1Q]
# Key: age, sex [4]
quarter age sex var var_log
<yearqtr> <fct> <fct> <dbl> <dbl>
1 1990 Q1 18-25 male 50 3.91
2 1990 Q2 18-25 male 49.9 3.91
3 1990 Q3 18-25 male 51.1 3.93
4 1990 Q4 18-25 male 52.6 3.96
5 1991 Q1 18-25 male 52.1 3.95
6 1991 Q2 18-25 male 51.4 3.94
7 1991 Q3 18-25 male 52.0 3.95
8 1991 Q4 18-25 male 51.2 3.94
9 1992 Q1 18-25 male 50.8 3.93
10 1992 Q2 18-25 male 51.7 3.95
# ... with 310 more rows
This data was generated with the following code:
library(zoo)
set.seed(42)
quarter <- as.yearqtr(seq(as.Date("1990-01-01"), by="quarter", length.out = 80), format = "%Y-%m-%d")
age <- c('18-25', 'Over 25')
sex <- c('male', 'female')
df <- expand.grid(quarter, age, sex)
names(df) <- c('quarter', 'age', 'sex')
df$var <- NA
df[df$age=='18-25' & df$sex== 'male', ]$var <- cumsum(c(50, rnorm(n=nrow(df[df$age=='18-25' & df$sex== 'male', ])-1, mean =.1)))
df[df$age=='18-25' & df$sex== 'female', ]$var <- cumsum(c(60, rnorm(n=nrow(df[df$age=='18-25' & df$sex== 'female', ])-1, mean =.2)))
df[df$age=='Over 25' & df$sex== 'male', ]$var <- cumsum(c(50, rnorm(n=nrow(df[df$age=='Over 25' & df$sex== 'male', ])-1, mean = (-.1))))
df[df$age=='Over 25' & df$sex== 'female', ]$var <- cumsum(c(60, rnorm(n=nrow(df[df$age=='Over 25' & df$sex== 'male', ])-1, mean = (-.2))))
df$var_log <- log(df$var)
df <- as_tsibble(df, index=quarter, key=c('age', 'sex'))
I'm attempting to write a function that takes a list of function specifications as its inputs and loops through the functions to repeatedly estimate models, something that looks like this:
select <- dplyr::select
estimate_models <-
function(mdl_list, # A list of a list of model specifications
{
# This function is a single loop for estimating models.
for (i in 1:length(mdl_list)) {
# Extract model information -----------------------------------------------
mdl_name <- mdl_list[[i]][["mdl"]] # Name of model
mdl_type <- sub("_.*","",mdl_name) # Type of model,
mdl_vars_ari <- mdl_list[[i]][["ari"]] # Contains the dependent variables in ARIMA models
# ARIMA model estimation --------------------------------------------------
print(paste0("Estimating ", mdl_name, "..."))
# Estimate the ARIMA model
mdl_vars_ari_enquo <- enquo(mdl_vars_ari)
mdl <- mdl_data %>%
model(arima = ARIMA(!!mdl_vars_ari_enquo)) %>%
forecast(h=28) %>% # Forecast 28 periods ahead
fortify() %>% # Extracts the forecast as a dataframe
filter(.level==95) %>% # Filter results where the confidence level is 95%
mutate(mcv_fnl = exp(!!mdl_vars_ari_enquo), quarter = as.Date(quarter, format="%y%m%d")) %>% # Take the exponent and set type of the quarter column to 'Date'
select(-contains(".")) %>% # Remove extra columns
rbind(mdl_data) # Rbind fitted values to the model data
}
}
The mdl_list
, containing the specification details looks something like this:
mdl_list <- list(list(mdl = "arima_model", ari = "var_log", d = "df"))
When trying to run the code I get the following error:
model(mdl_data, arima=ARIMA(!!mdl_vars_ari_enquo))
Warning: 4 errors (1 unique) encountered for arima
[4] Could not find an appropriate ARIMA model.
This seems to be related to the way that the variable name argument in ARIMA(!!mdl_vars_ari_enquo)
is being parsed. Passing in var_log
works fine. But passing in mdl_vars_ari
doesn't work, I assume because of dplyr's non-standard evaluation.
I have read Hadley Wickham's guide here: https://dplyr.tidyverse.org/articles/programming.html
but neither quo()
, nor enquo()
seem to do the trick. I have also tried as.name()
but to no avail.
Let me know if you need more details to answer my question.