1

I have a data series composed by 2775 elements:

mean(series)
[1] 21.24862
length(series)
[1] 2775
max(series)
[1] 81.22
min(series)
[1] 9.192

I would like to obtain the best ARIMA model by using function auto.arima of package forecast:

library(forecast)
fit=auto.arima(Netherlands,stepwise=F,approximation = F)

But I am having a big problem: RStudio is running for an hour and a half without results. (I developed an R code to perform these calculations, employed on a Windows machine equipped with a 2.80GHz Intel(R) Core(TM) i7 CPU and 16.0 GB RAM.) I suspect that this is due to the length of time series. A solution could be the parallelization? (But I don't know how apply it).

Anyway, suggestions to speed this code? Thanks!

Mark
  • 1,577
  • 16
  • 43

1 Answers1

1

The forecast package has many of its functions built with parallel processing in mind. One of the arguments of the auto.arima() function is 'parallel'.

According to the package documentation, "If [parallel = ] TRUE and stepwise = FALSE, then the specification search is done in parallel.This can give a significant speedup on mutlicore machines."

If parallel = TRUE, it will automatically select how many 'cores' to use (for a laptop or desktop, it is often the number of cores * 2. For example, I have 4 cores and each core has 2 processors = 8 'cores'). If you want to manually set the number of cores, also use the argument num.cores.

I'd recommend checking out the e-book written by Hyndman all about the package. It is like a time-series forecasting bible.

DanWaters
  • 517
  • 5
  • 13
  • Your statement on `parallel` as regards `auto.arima()` is only true on paper but not in practice. Whenever I use `parallel = TRUE` and `stepwise = FALSE` it takes longer as I measure with `system.time()[[3]]` – Daniel James Sep 13 '20 at 22:11