I'm currently working on a pet project to forecast future base oil prices from historical base oil prices. The data is weekly but there are some periods in between where prices are missing.
I'm somewhat okay with modelling time series with complete data but when it comes to irregular ones, the models that I've learnt may not be applicable. Do I use xts class and proceed with ARIMA models in R the usual way?
After building a model to predict future prices, I'd like to factor in crude oil price fluctuation, diesel profit margin, car sales, economic growth and so on(Multivariable?) to improve accuracy. Can someone shed some light on how do I go about doing this the efficient way? In my mind, it looks like a maze.
EDIT: Trimmed Data here: https://docs.google.com/document/d/18pt4ulTpaVWQhVKn9XJHhQjvKwNI9uQystLL4WYinrY/edit
Coding:
Mod.fit<-arima(Y,order =c(3,2,6), method ="ML")
Result: Warning message: In log(s2) : NaNs produced
Will this warning affect my model accuracy?
With missing data, I can't use ACF and PACF. Is there a better way to select models? I used AIC(Akaike's Information Criterion) to compare different ARIMA models using this code.ARIMA(3,2,6) gave the smallest AIC.
Coding:
AIC<-matrix(0,6,6)
for(p in 0:5)
for(q in 0:5)
{
mod.fit<-arima(Y,order=c(p,2,q))
AIC[p+1,q+1]<-mod.fit$aic
p
}
AIC
Result:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1396.913 1328.481 1327.896 1328.350 1326.057 1325.063
[2,] 1343.925 1326.862 1328.321 1328.644 1325.239 1318.282
[3,] 1334.642 1328.013 1330.005 1327.304 1326.882 1314.239
[4,] 1336.393 1329.954 1324.114 1322.136 1323.567 1316.150
[5,] 1319.137 1321.030 1320.575 1321.287 1323.750 1316.815
[6,] 1321.135 1322.634 1320.115 1323.670 1325.649 1318.015