ARIMAX With Multiple Lags

Question

I am using statsmodels ARIMA package to create some ARIMAX models. I am using a few different exogeneous variables in my prediction. From the documentation though it seems like the model only uses the current value of the exogenous variables to predict my endogenous variable. Is there a good way of including exogenous variables with lags to a certain point (old values) in my prediction without adding the lagged exogenous series to the model manually?

To be clear, the solution here is exactly what I want to avoid doing.

Arne Decker · Answer 1 · 2021-12-18T21:56:56.577

I get your point, but this will not work. You can think of ARIMA as a single function, e.g. an ARIMA(2, 0, 1) on a time series f(t) would be something like:

ar.L1 * f(t-1) + ar.L2 * f(t-2) + ma.L1 * e(t-1)

ar.L1 and ar.L2 are the coefficients used for the autoregression on the first and second lag. ma.L1 is the coefficient used for the moving average model (the previous forecast errors are usually named e(t). You can see that in the .summary() of the ARIMA model:

Now, when you add exogenous variables to the ARIMA model, you simply add another term to this function:

ar.L1 * f(t-1) + ar.L2 * f(t-2) + ma.L1 * e(t-1) + coef_exog * exog(t)

The ARIMA implementation of statsmodels only allows for a single value per exogenous variable, because there is only one coefficient. Maybe they will change this in the future, but right now passing the exogenous lags as separat values is the only possibility.

You can create the lagged version by using pandas .diff() method.

ARIMAX With Multiple Lags

1 Answers1