5

I am working on project to forecast sales of stores to learn forecasting.Till now I have successfully used simple auto.Arima() function for forecasting.But to make these forecast more accurate I can make use of covariates.I have defined covariates like holidays, promotion which affect on sales of store using xreg operator with the help of this post: How to setup xreg argument in auto.arima() in R?

But my code fails at line:

ARIMAfit <- auto.arima(saledata, xreg=covariates)

and gives error saying:

Error in model.frame.default(formula = x ~ xreg, drop.unused.levels = TRUE) : variable lengths differ (found for 'xreg') In addition: Warning message: In !is.na(x) & !is.na(rowSums(xreg)) : longer object length is not a multiple of shorter object length

Below is link to my Dataset: https://drive.google.com/file/d/0B-KJYBgmb044blZGSWhHNEoxaHM/view?usp=sharing

This is my code:

data = read.csv("xdata.csv")[1:96,]
View(data)

saledata <- ts(data[1:96,4],start=1)
View(saledata)

saledata[saledata == 0] <- 1
View(saledata)

covariates = cbind(DayOfWeek=model.matrix(~as.factor(data$DayOfWeek)),
                 Customers=data$Customers,
             Open=data$Open,
                 Promo=data$Promo,
             SchoolHoliday=data$SchoolHoliday)
View(head(covariates))


# Remove intercept
covariates <- covariates[,-1]
View(covariates)

require(forecast)
ARIMAfit <- auto.arima(saledata, xreg=covariates)//HERE IS ERROR LINE
summary(ARIMAfit)

Also tell me how I can forecast for next 48 days.I know how to forecast using simple auto.Arima() and n.ahead but dont know how to do it when xreg is used.

Raad
  • 2,675
  • 1
  • 13
  • 26
ptim ktim
  • 65
  • 1
  • 1
  • 6

1 Answers1

5

A few points. One, you can just convert the entire matrix to a ts object and then isolate the variables later. Second, if you are using covariates in your arima model then you will need to provide them when you forecast out-of-sample. This may mean forecasting each of the covariates before generating forecasts for your variable of interest. In the example below I split the data into two samples for simplicity.

dta = read.csv("xdata.csv")[1:96,]
dta <- ts(dta, start = 1)

# to illustrate out of sample forecasting with covariates lets split the data
train <- window(dta, end = 90)
test <- window(dta, start = 91)

# fit model
covariates <- c("DayOfWeek", "Customers", "Open", "Promo", "SchoolHoliday")
fit <- auto.arima(train[,"Sales"], xreg = train[, covariates])

# forecast
fcast <- forecast(fit, xreg = test[, covariates])
Raad
  • 2,675
  • 1
  • 13
  • 26
  • Hii NBATrends I did not understood the second code that you have writen to illustrate out of sample forecasting with covariates lets split the data.What window function does and for what end=90 and start=91 specified here in window function – ptim ktim Dec 13 '15 at 11:55
  • Like I said in order to forecast out-of-sample when you have covariates in your arima model you will need to supply the value of them. In the example above, I am just splitting the data into two for illustration. The first sample I use for fitting the model and the second I use the covariates from to forecast. The window function can be used to subset the data. – Raad Dec 13 '15 at 11:58
  • ok so data from 1 to 90 will be get used for training model and 91 to 96(6 days) will be used for forecasting of these 6 days with covariates of these 6 days. Right?? – ptim ktim Dec 13 '15 at 12:02
  • Noticed I made a mistake, please see the edit above. We train using the first 90 weeks or 624 days, and then forecast using the covariates from remaining 42 days. – Raad Dec 13 '15 at 12:11
  • but my data is daily data with 96 observation.How there can be 90 weeks?I am confused.Just tell me if I have to use first 48 observation for training model and forecast for next 48 days using their available covariates then for train end will be 48 and for test start will be 49 in the above dataset.Right?? – ptim ktim Dec 13 '15 at 12:22
  • Look, you have your frequency listed as 7 I didn't stop to look at your data. In that case yes, you have daily and what you said holds. But you aren't forced to forecasts in sample you can forecast out of sample as long as you supply new values for each of the covariates. – Raad Dec 13 '15 at 12:26