I am trying to do forecasting based on ARIMA
. Currently I am choosing the best ARIMA
model and predicting for a certain period based on the best chosen ARIMA
model. I am doing that by getting the AIC value
and keeping the fact in mind that: The lesser the AIC the better. However, I need to be able to implement a way to verify for my function so that I do not solely have to rely on the least AIC value. So, there should be another way to detect that the model I chose is giving me the best results.
To give a clear view, let's say my ARIMA
model is supposed to give me values between 5 to 10 based on the historical data input but for some reason after finding the best model it is giving me values which lies somewhere around 1000. It is definitely unusual.
What could be an alternative way to verify that ARIMA
model is giving me the correct values apart from the given (least AIC) approach?
Following is my code:
import os
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
sns.set()
import statsmodels.tsa.api as smt
import statsmodels.api as sm
def arima_ci(df_train):
df_s = df_train
param_range = 3
ps = range(0, param_range)
d = 1
qs = range(0, param_range)
# Create a list with all possible combinations of parameters
parameters = product(ps, qs)
parameters_list = list(parameters)
# Train many ARIMA models to find the best set of parameters
def optimize_ARIMA(parameters_list, d):
"""
parameters_list - list with (p, q) tuples
d - integration order
"""
results = []
best_aic = float('inf')
for param in parameters_list:
try:
model = sm.tsa.SARIMAX(df_s, order=(param[0], d, param[1])).fit(disp=-1)
except:
continue
aic = model.aic
# Save best model, AIC and parameters
if aic < best_aic:
best_model = model
best_aic = aic
best_param = param
results.append([param, model.aic])
result_table = pd.DataFrame(results)
result_table.columns = ['parameters', 'aic']
# Sort by AIC in ascending order (lower AIC is better)
result_table = result_table.sort_values(by='aic', ascending=True).reset_index(drop=True)
return result_table
with warnings.catch_warnings():
warnings.filterwarnings("ignore") # Ignore all warnings within this block
result_table = optimize_ARIMA(parameters_list, d)
# result_table = optimize_ARIMA(parameters_list, d)
p, q = result_table.parameters[0]
best_model = sm.tsa.SARIMAX(df_s, order=(p, d, q)).fit(disp=-1)
# print(best_model.summary())
#
# do forecast for period?
n_steps = fcast_period = 1
forecast_values = best_model.forecast(steps=n_steps)
# print(forecast_values)
# #
forecast = best_model.get_forecast(steps=n_steps)
forecast_values = forecast.predicted_mean
forecast_ci = forecast.conf_int(alpha=0.05)
lower_ci = forecast_ci.iloc[:, 0]
upper_ci = forecast_ci.iloc[:, 1]
last_date = df_train.index[-1] # Last date in original data
next_month = last_date + pd.DateOffset(months=1)
# Get the next month after last_date
# c i
arima_forecast_df = pd.DataFrame({
# 'Invoice Date': pd.date_range(start=next_month, periods=n_steps, freq='MS'),
'arima': forecast_values.astype(int),
'arima_l': lower_ci.astype(int).apply(lambda x: max(0, x)), # Adding lower confidence interval
'arima_u': upper_ci.astype(int) # Adding upper confidence interval
})
return arima_forecast_df
result_arima_ci = arima_ci(dfn_resampled)
print(type(result_arima_ci))
result_arima_ci
In this regard, dfn_resampled
is a Pandas series. In simple words, it is my training data
# code
dfn_resampled.info()
# output
<class 'pandas.core.series.Series'>
DatetimeIndex: 73 entries, 2017-07-01 to 2023-07-01
Freq: MS
Series name: Quantity
Non-Null Count Dtype
-------------- -----
73 non-null int64
dtypes: int64(1)
I am avoiding the auto-arima library as that gave me poor results. Please help me with this.