3

I have tried using auto arima in python at the same time on R for the same data but got different ARIMA model selection being the best model with different AIC. Can you tell me why I am getting different best models with different AIC from the two languages?

Data and codes for R

wineind <- c(15136., 16733., 20016., 17708., 18019., 19227., 22893., 23739.,
         21133., 22591., 26786., 29740., 15028., 17977., 20008., 21354.,
         19498., 22125., 25817., 28779., 20960., 22254., 27392., 29945.,
         16933., 17892., 20533., 23569., 22417., 22084., 26580., 27454.,
         24081., 23451., 28991., 31386., 16896., 20045., 23471., 21747.,
         25621., 23859., 25500., 30998., 24475., 23145., 29701., 34365.,
         17556., 22077., 25702., 22214., 26886., 23191., 27831., 35406.,
         23195., 25110., 30009., 36242., 18450., 21845., 26488., 22394.,
         28057., 25451., 24872., 33424., 24052., 28449., 33533., 37351.,
         19969., 21701., 26249., 24493., 24603., 26485., 30723., 34569.,
         26689., 26157., 32064., 38870., 21337., 19419., 23166., 28286.,
         24570., 24001., 33151., 24878., 26804., 28967., 33311., 40226.,
         20504., 23060., 23562., 27562., 23940., 24584., 34303., 25517.,
         23494., 29095., 32903., 34379., 16991., 21109., 23740., 25552.,
         21752., 20294., 29009., 25500., 24166., 26960., 31222., 38641.,
         14672., 17543., 25453., 32683., 22449., 22316., 27595., 25451.,
         25421., 25288., 32568., 35110., 16052., 22146., 21198., 19543.,
         22084., 23816., 29961., 26773., 26635., 26972., 30207., 38687.,
         16974., 21697., 24179., 23757., 25013., 24019., 30345., 24488.,
         25156., 25650., 30923., 37240., 17466., 19463., 24352., 26805.,
         25236., 24735., 29356., 31234., 22724., 28496., 32857., 37198.,
         13652., 22784., 23565., 26323., 23779., 27549., 29660., 23356.)
tswineind<-ts(wineind, start=c(1985,1), frequency=12)
library(forecast)
tswineindbest<-auto.arima(tswineind,approximation = FALSE)
tswineindbest

Result for R

ARIMA(0,1,3)(0,1,1)[12]

Data and codes for Python

import numpy as np
import pmdarima as pm
from pmdarima.datasets import load_wineind

# this is a dataset from R
wineind = load_wineind().astype(np.float64)

# fit stepwise auto-ARIMA
stepwise_fit = pm.auto_arima(wineind, start_p=1, start_q=1,
                             max_p=3, max_q=3, m=12,
                             start_P=0, seasonal=True,
                             d=1, D=1, trace=True,
                             error_action='ignore',  # don't want to know if an order does not work
                             suppress_warnings=True,  # don't want convergence warnings
                             stepwise=True)  # set to stepwise
stepwise_fit.summary()

Result for Python

    SARIMAX(1, 1, 2)x(0, 1, 1, 12)  AIC 3066.742

I expected the same best model and same AIC for both R and Python.

Daniel James
  • 1,381
  • 1
  • 10
  • 28
  • 3
    **I expected the same best model and same AIC for both R and Python.** Are you certain that the same algorithm is used in both scenarios? This might be of interest: https://stackoverflow.com/questions/22770352/auto-arima-equivalent-for-python – NelsonGon Jun 28 '19 at 08:14
  • I see one big issue - you don't use the same data. Loaded `wineind` dataset from `pmdarima` package is not the same as your pasted data for R... – Nerxis Jul 17 '19 at 15:43
  • thanks for the observation I have edited the post. – Daniel James Jul 17 '19 at 20:11

1 Answers1

1

I have moved around the web and found this python code very useful

# import package
import itertools

# Define the p, d and q parameters to take any value between 0 and 2
p = d = q = range(0, 3)

# Generate all different combinations of p, q and q triplets
pdq = list(itertools.product(p, d, q))

# Generate all different combinations of seasonal p, q and q 
triplets
seasonal_pdq = [(x[0], x[1], x[2], 12) for x in 
list(itertools.product(p, d, q))]

print('Examples of parameter combinations for Seasonal ARIMA...')
print('SARIMAX: {} x {}'.format(pdq[1], seasonal_pdq[1]))
print('SARIMAX: {} x {}'.format(pdq[1], seasonal_pdq[2]))
print('SARIMAX: {} x {}'.format(pdq[2], seasonal_pdq[3]))
print('SARIMAX: {} x {}'.format(pdq[2], seasonal_pdq[4]))

And

warnings.filterwarnings("ignore") # specify to ignore warning messages

for param in pdq:
    for param_seasonal in seasonal_pdq:
        try:
            mod = sm.tsa.statespace.SARIMAX(ts,
                                     order=param,
                                     seasonal_order=param_seasonal,
                                     enforce_stationarity=False,
                                     enforce_invertibility=False)

            results = mod.fit()

            print('ARIMA{}x{}12 - AIC:{}'.format(param, param_seasonal, results.aic))
        except:
            continue

The variable tp imput here is the univiriate time series data which I indicate with tsin the python code. the result is the same as the auto.arima in R.

Daniel James
  • 1,381
  • 1
  • 10
  • 28