Loop and bootstrap script takes too long to run

Question

I have the below R script which takes more than 24 hours to but finally runs on Windows 10 of 10-gigabyte ram and core M7. The script does the following:

Here is what I desire to do with R

A. I have generated a 50-time series dataset.
B. I slice the same time series dataset into chunks of the following sizes: 2,3,...,48,49 making me have 48 different time series formed from step 1 above.
C. I divided each 48-time series dataset into train and test sets so I can use rmse function in Metrics package to get the Root Mean Squared Error (RMSE) for the 48 subseries formed in step 2.
D. The RMSE for each series is then tabulated according to their chunk sizes
E. I obtained the best ARIMA model for each 48 different time series data set.

My R Script

# simulate arima(1,0,0)
library(forecast)
library(Metrics)

n=50
phi <- 0.5
set.seed(1)

wn <- rnorm(n, mean=0, sd=1)
ar1 <- sqrt((wn[1])^2/(1-phi^2))

for(i in 2:n){
  ar1[i] <- ar1[i - 1] * phi + wn[i]
}
ts <- ar1

t <- length(ts)    # the length of the time series
li <- seq(n-2)+1   # vector of block sizes to be 1 < l < n (i.e to be between 1 and n exclusively)

# vector to store block means
RMSEblk <- matrix(nrow = 1, ncol = length(li))
colnames(RMSEblk) <-li

for (b in 1:length(li)){
    l <- li[b]# block size
    m <- ceiling(t / l)                                 # number of blocks
    blk <- split(ts, rep(1:m, each=l, length.out = t))  # divides the series into blocks

    # initialize vector to receive result from for loop
    singleblock <- vector()                     
    for(i in 1:1000){
        res<-sample(blk, replace=T, 10000)        # resamples the blocks
        res.unlist<-unlist(res, use.names = F)    # unlist the bootstrap series
        # Split the series into train and test set
        train <- head(res.unlist, round(length(res.unlist) * 0.6))
        h <- length(res.unlist) - length(train)
        test <- tail(res.unlist, h)

        # Forecast for train set
        model <- auto.arima(train)
        future <- forecast(test, model=model,h=h)
        nfuture <- as.numeric(future$mean)        # makes the `future` object a vector            
        RMSE <- rmse(test, nfuture)               # use the `rmse` function from `Metrics` package

        singleblock[i] <- RMSE # Assign RMSE value to final result vector element i
    }

    RMSEblk[b] <- mean(singleblock) # store into matrix
}

RMSEblk

The R script actually runs but it takes more than 24 hours to complete. The number of runs in the loops (10000 and 1000) are the minimum that is necessary to make the task perfect.

Please what can I do to make the script complete in less time?

If I followed your code correctly, you're trying to estimate 48,000 ARIMA models via `auto.arima()`? If I had to guess, that's your bottleneck. A few random thoughts: 1) can you run this in parallel? I don't see any dependencies between each iteration so you could leverage each core available to you. 2) Can you move any of the sample code out of the inner for loop? 3) Are there setting w/i `auto.arima()` you can set to speed up the computation? 4) Have you profiled the code to confirm where the bottleneck is? If not, here's a good article: http://adv-r.had.co.nz/Profiling.html — Chase, May 23 '20 at 23:44
object `li` has values of 2:49 (length of 48). The inner loop iterates 1000 times which is where `auto.arima` is called. So isn't that 48 * 1000 = 48000 calls to `auto.arima`? Regardless, all of my comments above still hold true...for this code to take 24 hours means you are likely growing / iterating an object...R is likely having to reallocate memory 48000 times and on each iteration it has to find a bigger chunk of memory...this is inefficient for obvious reasons. — Chase, May 23 '20 at 23:54
If you don't believe me, stick this above the call to `auto.arima()` and watch how many values are spit out to the screen `print(paste0(b, "-", i))`. — Chase, May 23 '20 at 23:59
given a time series x1, x2, ..., x50. It will be sliced into 1) x1 x2, x3 x4, ..., x49 x50. 2) x1 x2 x3, x4 x5 x6, ..., x48 x49 x50 the the block of fours and fivs and sixs up to block of 49 — Daniel James, May 24 '20 at 00:12
For a slice of twos for example 10,000 block bootstrap will be selected to ma a series of 2*10,000 data point. And for a slice of threes, 10,000 bootstrap will be made to make a series of 3*10,000. It continues like that till a slice of 49 which will be a bootstrap samples of 10,000 making 49*10,000 data points — Daniel James, May 24 '20 at 00:21
Comput the RMSE for the 48 models. Subsequently, repeat the process in my last comment 1000 times (which is the Monte Carlo process). — Daniel James, May 24 '20 at 00:32
Taking over from @Chase: the fundamental problem is that, no matter *why* you're doing it, you are indeed running the `auto.arima()` function 48000 times. — Ben Bolker, May 24 '20 at 01:02

Ben Bolker · Answer 1 · 2020-05-24T01:33:37.923

tl;dr you're probably going to have to parallelize this somehow.

One problem is that you are growing an object; that is, first you're allocating a zero-length vector (singleblock <- vector()), then you're incrementing it one element at a time (singleblock[i] <- RMSE). As discussed in Chapter 2 of the R Inferno, this is super-inefficient. For this sample it's 5x slower.

f1 <- function(x) { p <- numeric(0); for (i in 1:1000) p[i] <- 0 }
f2 <- function(x) { p <- numeric(1000); for (i in 1:1000) p[i] <- 0 }
microbenchmark(f1(),f2())
## Unit: microseconds
##  expr     min       lq      mean  median      uq     max neval cld
##  f1() 202.519 207.2105 249.84095 210.574 221.340 3504.95   100   b
##  f2()  40.274  40.6710  69.83741  40.9615  42.8275 2811.779   100  a

However: that's not really relevant. The inefficient version of this (growing the vector) takes a median time of 210 microseconds.

microbenchmark(auto.arima(train),times=20L)
## Unit: milliseconds
##               expr      min       lq     mean   median       uq      max neval
##  auto.arima(train) 630.7335 648.3471 679.2703 657.6697 668.0563 829.1648    20

Your auto.arima() call takes about 660 milliseconds - about 3000 times longer. Using a similar microbenchmark call for the forecasting step gives a median time of about 20 milliseconds.

You could do more formal profiling, or continue in bits and pieces as shown here, but I don't see anything in your code that looks like it would take a long time (I'd probably check sample() next, but I doubt it's comparable to auto.arima().)

Unless you can find a faster version of auto.arima() (I doubt it), or strip things down (e.g. restrict the search space), your only remaining choice is to parallelize. You can do this at many different levels, with many different tools, but the first place to look would be the parallel option to auto.arima. You might choose instead to parallelize the loop (doing a web search on 'parallel computation in R' gives lots of resources); be aware that trying to parallelize at more than one level can bite you.

PS the crude calculation (48000 * 660 milliseconds) gives about 9 hours - that only accounts for about 1/3 of the time (I would have expected it to get to 80% or so); maybe your processor is slower than mine?

Can you please demonstrate the remedies you provided on the script? — Daniel James, May 24 '20 at 01:40
sorry, I think I've spent as much time on this as I have available right now. These are pretty standard things to do, I've given lots of references/pointers. If you try them and get stuck somewhere, feel free to post a follow-up question. — Ben Bolker, May 24 '20 at 01:42
@DanielJames - here's a good intro on setting up parallel frameworks - I would probably set this up so your cores each take a subset of the outer-loop: https://cran.r-project.org/web/packages/doParallel/vignettes/gettingstartedParallel.pdf. As Ben points out above, finding ways to cut down time on `auto.arima()` will pay biggest dividends. I'm not all that familiar with the guts of `auto.arima()`, but you could check to see if there's an inordinate amount of time taken up in error checking that you could cut out (similar to how `lm()` does a bunch of pre-processing befire fitting the model. — Chase, May 24 '20 at 03:45
@DanielJames - this problem is a great use case for renting a multi-core machine from your cloud provider of choice...I'd figure out how to run your script in parallel and then farm it out to a big instance of EC2: https://aws.amazon.com/blogs/big-data/running-r-on-aws/ — Chase, May 24 '20 at 03:48

score 1 · Accepted Answer · answered May 25 '20 at 00:06

For demonstration, to avoid growing objects in a loop, consider apply family solutions such as vapply. Notice how RMSEblk and singleblock are now directly assigned the result of vapply without bookkeeping of assigning elements by index.

...

# DEFINED METHOD
proc_bootstrap <- function(b) {
    l <- li[b]                                          # block size
    m <- ceiling(t / l)                                 # number of blocks
    blk <- split(ts, rep(1:m, each=l, length.out = t))  # divides the series into blocks

    # initialize vector to receive result from for loop
    singleblock <- vapply(1:1000, function(i) {
      res <- sample(blk, replace=TRUE, 10000)        # resamples the blocks
      res.unlist <- unlist(res, use.names = FALSE)   # unlist the bootstrap series

      # Split the series into train and test set
      train <- head(res.unlist, round(length(res.unlist) * 0.6))
      h <- length(res.unlist) - length(train)
      test <- tail(res.unlist, h)

      # Forecast for train set
      model <- auto.arima(train)
      future <- forecast(test, model=model,h=h)
      nfuture <- as.numeric(future$mean)        # makes the `future` object a vector

      RMSE <- Metrics::rmse(test, nfuture)      # RETURN RMSE
    }, numeric(1))

    mean(singleblock)                           # RETURN MEAN
  }

# VAPPLY CALL
RMSEblk <- vapply(1:length(li), proc_bootstrap, numeric(1))

Alternatively, to fill in your originally defined one-row matrix (maybe better as a named vector?):

# MATRIX to store block means
RMSEblk <- matrix(nrow = 1, ncol = length(li))
colnames(RMSEblk) <-li

RMSEblk[] <- vapply(1:length(li), proc_bootstrap, numeric(1))

Note: Above may not materially differ from nested for loops in timings as you still iterate through 48,000 models calls. Possibly, though, this solution can scale better on larger iterations. But as discussed, look into parallel processing (see parallel, doParallel, foreach packages) which can be translated from for or apply solutions.

Be sure to also profile which shows (outside of modeling calls) unlist, head, tail to have timing issues:

utils::Rprof(tmp <- tempfile(), memory.profiling = TRUE)
RMSEblk <- vapply(1:length(li), proc_bootstrap, numeric(1))
utils::Rprof(NULL)
summaryRprof(tmp, memory="both")
unlink(tmp)

I think this is interesting/useful but also a little bit tangential. (1) As you say (and as I said above), the object-growing problem is actually only a small component of the overall problem. (2) `apply` family solutions don't actually perform any different from `for` loops once you remember to pre-allocate (see R Inferno chapter 4, "Over-vectorizing"). The last bit is the most useful/adds the most to my answer. — Ben Bolker, May 29 '20 at 00:32
Understood, @BenBolker. As I open up, this is for demonstration, arguably a more compact approach than OP's original. But yes, another efficient approach is to initialize objects with length, then assign. And re the age-old R debate of `for` vs `apply` see this older answer: [Is the “*apply” family really not vectorized?](https://stackoverflow.com/q/28983292/1422451) (of course there is no rule of thumb as context matters). — Parfait, May 29 '20 at 01:15
`RMSE` is never used in your script is it omission or intentional? `singleblock[i] <- RMSE # Assign RMSE value to final result vector element i` as I put in the question. in your case, you did defined `RMSE <- Metrics::rmse(test, nfuture) # RETURN RMSE` it in your answer but `RMSE` is never used further than its definition. — Daniel James, Sep 13 '20 at 19:27

Loop and bootstrap script takes too long to run

2 Answers2