0

Im new with R and I need to compare the accuracy of ARIMAX and ARIMA. This is a sample of my data and what I've done to do the ARIMA model:

library(dplyr)
library(forecast)
library(lubridate)
data<-tibble::tribble(
        ~id, ~day, ~month, ~year, ~value, ~reg1, ~reg2,
         1L,   1L,     1L, 2019L,  4.634, 0.626, 0.684,
         1L,   1L,     2L, 2019L,  2.969, 0.698, 0.049,
         1L,   1L,     3L, 2019L,  1.885,  0.62, 0.155,
         1L,   1L,     4L, 2019L,  2.415, 0.553, 0.959,
         1L,   1L,     5L, 2019L,  2.215, 0.598, 0.065,
         1L,   1L,     6L, 2019L,  1.805, 0.454,  0.07,
         1L,   1L,     7L, 2019L,  4.682, 0.045, 0.376,
         1L,   1L,     8L, 2019L,  4.248, 0.087, 0.094,
         1L,   1L,     9L, 2019L,   0.55, 0.523,  0.86,
         1L,   1L,    10L, 2019L,  0.109, 0.176, 0.591,
         2L,   1L,     1L, 2019L,  2.918, 0.442, 0.956,
         2L,   1L,     2L, 2019L,  3.083, 0.233, 0.388,
         2L,   1L,     3L, 2019L,  3.271, 0.652, 0.946,
         2L,   1L,     4L, 2019L,  2.175, 0.704, 0.902,
         2L,   1L,     5L, 2019L,   4.51, 0.851, 0.533,
         2L,   1L,     6L, 2019L,  4.178, 0.655, 0.614,
         2L,   1L,     7L, 2019L,  1.956, 0.434, 0.977,
         2L,   1L,     8L, 2019L,  3.219, 0.418,   0.4,
         2L,   1L,     9L, 2019L,   2.72, 0.335, 0.096,
         2L,   1L,    10L, 2019L,  4.519, 0.534, 0.388,
         3L,   1L,     1L, 2019L,  2.969, 0.707, 0.752,
         3L,   1L,     2L, 2019L,  2.456, 0.085, 0.651,
         3L,   1L,     3L, 2019L,  0.418, 0.851, 0.399,
         3L,   1L,     4L, 2019L,  2.324, 0.626, 0.317,
         3L,   1L,     5L, 2019L,  3.548, 0.175, 0.081,
         3L,   1L,     6L, 2019L,   3.74, 0.667, 0.691,
         3L,   1L,     7L, 2019L,   4.48, 0.853, 0.259,
         3L,   1L,     8L, 2019L,   0.18, 0.016, 0.489,
         3L,   1L,     9L, 2019L,  3.028,  0.51, 0.741,
         3L,   1L,    10L, 2019L,  4.652, 0.916, 0.953
        )
data<-data %>% 
  mutate(date=as.character(make_date(year,month,day)),YearMonth = tsibble::yearmonth((ymd(date)))) %>%
  as_tsibble(key=id,index = YearMonth)


fit <- data %>% 
  filter(YearMonth <= yearmonth("2019 Aug")) %>%
  model(ARIMA(value ~ PDQ(0,0,0), stepwise=FALSE, approximation=FALSE))

# Now forecast the test set and compute RMSE and MSE
fit %>%
  forecast(h = 2) %>%
  accuracy(data)

Now I need to do this but with an ARIMAX:

covariates <- c("reg1","reg2")
fit_arimax <- data %>% 
  filter(YearMonth <= yearmonth("2019 Aug")) %>%
  group_by(id) %>% 
  do(autoarima=auto.arima(.$value,xreg = as.matrix(data[,covariates])))

and I get the following error:

Error in model.frame.default(formula = x ~ xregg, drop.unused.levels = TRUE) : 
  variable lengths differ (found for 'xregg')  

In addition: Warning message: In !is.na(x) & !is.na(rowSums(xreg)) : longer object length is not a multiple of shorter object length

I saw this answer but I couldn't do it, as I'm a beginner in R. So I want to know if ARIMA has something to work with the regressors or how to solve it with auto.arima, and then compare the accuracy by ID in ARIMA and ARIMAX. Does anyone know how to? thanks !

AWM
  • 3
  • 1
  • 1
  • 2

1 Answers1

3

You've switched from using the tsibble and fable packages to using the forecast packages. These use different data structures and should not generally be mixed.

You can easily fit a regression model with ARIMA errors using fable as follows.

fit_arimax <- data %>% 
  filter(YearMonth <= yearmonth("2019 Aug")) %>%
  model(
    ARIMA(value ~ reg1 + reg2 + PDQ(0,0,0))
  ) 

fc <- fit_arimax %>%
  forecast(new_data = filter(data, YearMonth > yearmonth("2019 Aug"))) 
fc %>% accuracy(data)

Note that this is not actually an ARIMAX model -- see https://robjhyndman.com/hyndsight/arimax/

Rob Hyndman
  • 30,301
  • 7
  • 73
  • 85
  • Thanks for the answer! Just one question, how can I see the predicted values? For example if I want the values from fit_arimax for ID 1 I tried using fit_arimax$$`ARIMA(value ~ reg1 + reg2 + PDQ(0,0,0))`[1] but I only got the ARIMA model in text – AWM May 28 '20 at 18:50
  • The forecasts are provided as a column in the `fc` object. They can be extracted as with any dataframe or tibble. – Rob Hyndman May 28 '20 at 22:59
  • Thanks a lot for your answer! One last question, it is possible to add a column with the prediction intervals? For example I want to have an upper and lower using: upper <- fitted(fc) + 1.96*sqrt(fc$sigma2), lower <- fitted(fc) - 1.96*sqrt(fc$sigma2) But I don't know how to find the sigma2 – AWM Jun 07 '20 at 19:29
  • Use the `hilo` function – Rob Hyndman Jun 08 '20 at 01:11