-2

Suppose I have the following data:

library(forecast)
library(lubridate)

set.seed(123)

weeks <- rep(seq(as.Date("2010-01-01"), as.Date("2023-01-01"), by = "week"), each = 1)
counts <- rpois(length(weeks), lambda = 50)
df <- data.frame(Week = as.character(weeks), Count = counts)

# Convert Week column to Date format
df$Week <- as.Date(df$Week)

# Create a time series object
ts_data <- ts(df$Count, frequency = 52, start = c(year(min(df$Week)), 1))

I am trying to learn how to use "Rolling Cross Validation" for Time Series Models (e.g. ARIMA) in R.

As I understand, this involves (ordering the data in chronological order):

  • Fit a model to the first 60 data points, predict the next 5 and record the error
  • Next, fit a model model to the first 65 points, predict the next 5 and record the error
  • etc.

enter image description here

Previously, I had tried to write the R code myself for implementing this procedure (Correctly understanding loop iterations) - however, this appeared to be difficult and I am now interested in seeing if there are any ready-made implementations of such a procedure.

While looking on the internet, I found the following function that I think might be able to accomplish the desired task : https://search.r-project.org/CRAN/refmans/forecast/html/tsCV.html

I tried to run this function on my data:

# note: I am specifically interested in using the auto.arima() function
far2 <- function(x, h){forecast(auto.arima(x), h=h)}
e <- tsCV(ts_data, far2, h=5)

The code seems to be running - But I am not sure if I am doing this correctly.

For example:

  • Is this code fitting a time series model on the first 5 points, predicts next 5, record error - then fits model on first 10 points, predicts next 5, record error, etc.?

  • From here, when the code finishes running - do I need to manually calculate the MAE and RMSE errors myself?

Can someone please comment on this?

Thanks!

Note: Is it better to use something like this https://business-science.github.io/modeltime.resample/articles/getting-started.html ?

stats_noob
  • 5,401
  • 4
  • 27
  • 83
  • 2
    This question is too open ended. You don't have a programming question, but rather a question about the underlying theory, or which tool might be the best for some unknown application. If you have already reviewed the packages documentation, vignette, and the package author's blog posts, reading through the source code on Github may be helpful. If you can focus your question on some specifics about the underlying theory, that might be a good fit for Stack Exchange. – Matt Summersgill Mar 21 '23 at 13:49
  • @ Matt Summersgill: thank you for your reply! I just want to understand what this function is supposed to do and how to use it correctly. Thank you so much! – stats_noob Mar 21 '23 at 13:57
  • One correction - when I said "Stack Exchange", I meant to reference the Stack Exchange "Cross Validated" site, which is dedicated to statistics theory. For example: https://stats.stackexchange.com/questions/tagged/r%20cross-validation?sort=MostVotes – Matt Summersgill Mar 21 '23 at 14:03
  • 2
    This was already answered in your other question. https://stackoverflow.com/questions/75771637/r-using-window-functions-for-time-series-models/75848332#75848332 Also when copying images and code from other posts and sources provide links to them. Asking for opinions on what packages to use is regarded as soliciting opinion based answers which is off topic for SO. Please read the information at the top of the [tag:r] tag page. – G. Grothendieck Mar 27 '23 at 12:42

1 Answers1

1

First of all, your statement about 5-step ahead cross-validation is flawed. As the picture you posted reveals you refer to Rob Hyndman's nice textbook:

https://otexts.com/fpp3/tscv.html

But what you mention here:

Next, fit a model model to the first 65 points, predict the next 5 and record the error

contradicts the picture in the same textbook for multistep ahead forecasting:

https://robjhyndman.com/hyndsight/tscv/

According to the above picture, your training sets move forward by 1 observation each time irrespective your forecast horizon! You could move your training window forward by 5 (or some k) steps each time but you will not have forecasts for values in between! Thus, expanding your training window by 1 step each time is more relevant. If you insist on jumping 5 periods ahead each time while training your model, you could achieve it by:


    # Time series cross-validation accuracy
    data_tbl |>
      stretch_tsibble(.init = 3, .step = 5) |>
      relocate(Date, Symbol, .id)

A detailed explanation of the code above is again at: https://otexts.com/fpp3/tscv.html. Basically, .step argument controls the step-size you want to jump while training your model each time, while .init is the size of your initial training sample.

As for your question about the code:

Indeed, the way you wrote the code makes sure that you are making expanding window forecasts corresponding to the picture I posted. You could convert it to rolling window forecast by supplying (window = window_length) to tsCV function, but there is no a way to perform cross-validation by moving the training window by 5 steps, or by k>1 steps in tsCV (only 1 step jump is possible). I am posting a snapshot from the documentation of the tsCV package:

https://www.rdocumentation.org/packages/forecast/versions/8.21/topics/tsCV

Check: https://www.rdocumentation.org/packages/forecast/versions/8.21/topics/tsCV

Last but not the least, the errors your receive are cross validation errors obtained by an expanding window scheme forecasting (exactly as shown in the picture I posted).

Edit: There are 3 types of forecasting: recursive (expanding window), rolling and fixed forecasts explained well in [5].

In time series cross-validation one uses either recursive (expanding) window or rolling window. Hyndman refers to both schemes within cross validation as “evaluation on a rolling forecasting origin”, generally. However, you have to understand that “evaluation on a rolling forecasting origin” can be performed in two ways:

1) keeping the training window size fixed (say equal to T=100) - referred to as "rolling forecast" in the above reference;
2) expanding the training window size by 1 observation each time - expanding (recursive) window forecasting.

Sometimes, authors use the term "walk-forward" cross-validation to refer to both schemes generally (e.g. 1, 2, originaly due to [3]).

References:

[1] Hastie, Trevor, Robert Tibshirani, and Jerome H. Friedman. 2001. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Science & Business Media.

[2] Miller, Alan. 2002. Subset Selection in Regression. 2nd ed. New York: Chapman and Hall/CRC. https://doi.org/10.1201/9781420035933.

[3] Hjorth, Urban, and U. Hjort. 1982. “Model Selection and Forward Validation.” Scandinavian Journal of Statistics 9 (2): 95–105.

[4] https://otexts.com/fpp3/tscv.html

[5] Elliott, Graham, and Allan Timmermann. 2008. “Economic Forecasting.” Journal of Economic Literature 46 (1): 3–56. https://doi.org/10.1257/jel.46.1.3.
entropy
  • 58
  • 5