5

I have a dataset with "Time, Region, Sales" variables and I want to forecast sales for each region using ARIMA or ETS(SES) using library(forecast). There are a total of 70 regions and all of them have 152 observations each and (3 years of data). Something like this:

  Week      Region    Sales 
01/1/2011      A       129
07/1/2011      A       140
14/1/2011      A       133
21/1/2011      A       189
...           ...      ...
01/12/2013     Z       324
07/12/2013     Z       210
14/12/2013     Z       155
21/12/2013     Z       386
28/12/2013     Z       266 

So, I want R to treat every region as a different dataset and perform an auto.arima. I am guessing a for loop should be an ideal fit here but I miserably failed with it. What I would ideally want it to do is a for loop to run something like this (an auto arima for every 152 observations):

fit.A <- auto.arima(data$Sales[1:152])  
fit.B <- auto.arima(data$Sales[153:304])
....
fit.Z <- auto.arima(data$Sales[10490:10640])

I came across this but while converting the dataframe into timeseries, all I got is NAs.

Any help is appreciated! Thank you.

Shraddha
  • 155
  • 3
  • 16
  • You definitely want to try to avoid `for` loops in `R`. Consider the [`apply` family of functions.](http://nsaunders.wordpress.com/2010/08/20/a-brief-introduction-to-apply-in-r/) I'm working on an answer but see what you can do with these! – Al.Sal Jul 23 '14 at 12:35
  • `apply` is a `for` loop – David Arenburg Jul 23 '14 at 12:38
  • True, but good `R` style ought to avoid explicit looping. – Al.Sal Jul 23 '14 at 12:47
  • Thanks for the link @Al.Sal! In this case, I am guessing `base::by` is the best to use of all, as it splits dataset by factors. I tried something like `by(data, data$Region, auto.arima(data$Sales))` but I don't think R assumed `auto.arima` as a valid function. – Shraddha Jul 23 '14 at 12:51

2 Answers2

7

Try the very efficient data.table package (assuming your data set called temp)

library(data.table)
library(forecast)
temp <- setDT(temp)[, list(AR = list(auto.arima(Sales))), by = Region]

The last step will save your results in temp in a list formats (as this is the only format you can store this type of an object).

Afterwords you can do any operation you want on these lists, for example, Inspecting them:

temp$AR
#[[1]]
# Series: Sales 
# ARIMA(0,0,0) with non-zero mean 
# 
# Coefficients:
#   intercept
# 147.7500
# s.e.    12.0697
# 
# sigma^2 estimated as 582.7:  log likelihood=-18.41
# AIC=40.82   AICc=52.82   BIC=39.59
#
#[[2]]
# Series: Sales 
# ARIMA(0,0,0) with non-zero mean 
# 
# Coefficients:
#   intercept
# 268.2000
# s.e.    36.4404
# 
# sigma^2 estimated as 6639:  log likelihood=-29.1
# AIC=62.19   AICc=68.19   BIC=61.41

Or plot the forecasts (and etc.)

temp[, sapply(AR, function(x) plot(forecast(x, 10)))]
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • Thanks @David! That was amazing! But I have another question following that..I tried saving the output to a csv file by using `write.csv(temp, file="Arima outputs.csv")` , but this generated an error message instead - `unimplemented type 'list' in EncodeElement.` Is there a way to save it apart from printing it in Console? I need to do be able to use the model for future forecasts.. Thanks again! – Shraddha Jul 23 '14 at 13:27
  • The problem here is that you have lists objects and a csv file doesn't know how to store them. This is unrelated to the solution as a `list` is the only way to store `auto.arima. object. You can either compute specific metrics and save them as data frame or maybe take a look [here](http://stackoverflow.com/questions/19330949/r-how-to-save-lists-into-csv) – David Arenburg Jul 23 '14 at 13:32
7

You can do this easily with dplyr. Assuming your data frame is named df, run:

library(dplyr)
library(forecast)
model_fits <- group_by(df, Region) %>% do(fit=auto.arima(.$Sales))

The result is a data frame containing the model fits for each region:

> head(model_fits)
Source: local data frame [6 x 2]
Groups: <by row>

  Region        fit
1      A <S3:Arima>
2      B <S3:Arima>
3      C <S3:Arima>
4      D <S3:Arima>
5      E <S3:Arima>
6      F <S3:Arima>

You can get a list with each model fit like so:

> model_fits$fit
[[1]]
Series: .$Sales 
ARIMA(0,0,0) with non-zero mean 

Coefficients:
      intercept
       196.0000
s.e.    14.4486

sigma^2 estimated as 2088:  log likelihood=-52.41
AIC=108.82   AICc=110.53   BIC=109.42

[[2]]
Series: .$Sales 
ARIMA(0,0,0) with non-zero mean 

Coefficients:
      intercept
       179.2000
s.e.    14.3561

sigma^2 estimated as 2061:  log likelihood=-52.34
AIC=108.69   AICc=110.4   BIC=109.29
ramhiser
  • 3,342
  • 3
  • 23
  • 29
  • That really helped..thanks! Now, to this if I want to add `freq=7` for weekly seasonality, how do I do that? I cannot read the data in timeseries format to mention freq while reading it itself, since `ts` function allows to read only single variable data which would mean I can read only the `sales` column and not the `region` column. I tried something like `model_fits <- group_by(df, Region) %>% do(fit=auto.arima(.$Sales),freq=7)` but I don't think it's the right way. – Shraddha Jul 30 '14 at 01:49
  • I also tried reading the `sales` as a `timeseries` dataset and calling `region` from its orginal dataframe, something like: `model_fits <- group_by(ts.sales, df$Region) %>% do(fit=auto.arima(.$Sales))` but the error I get is this: `Error in UseMethod("mutate") : no applicable method for 'mutate' applied to an object of class "ts"` – Shraddha Jul 30 '14 at 01:55
  • You can specify your own functions to `dplyr` rather than calling `auto.arima` directly. Example: `model_fits <- group_by(df, Region) %>% do(fit=shraddha_wins(.$Sales, freq=7))` Within such a function, you can then apply the logic that you describe above. I'd recommend that you write a function and test it with the `Sales` from the first `Region`. Once the function behaves as desired, then apply the `group_by` operation. – ramhiser Jul 30 '14 at 14:45
  • thank you for the suggestion..i will try doing that.. by the way +1 for "shraddha_wins" :D – Shraddha Jul 30 '14 at 15:36