0

I have a code which takes the input as the Yield Spread (dependent var.) and Forward Rates(independent var.) and operate an auto.arima to get the orders. Afterwards, I am forecasting the next 25 dates (forc.horizon). My training data are the first 600 (training). Then I am moving the time window 25 dates, meaning using the data from 26 to 625, estimating the auto.arima and then forecasting the data from 626 to 650 and so on. My data sets are 2298 rows (date) and 30 columns (maturity).

I want to store all of the forecasts and then plot the forecasted and real values in the same plot.

This is the code I have, but it doesn't store the forecasts in a way to plot later.

forecast.func <- function(NS.spread, ind.v, maturity, training, forc.horizon){
  
  NS.spread <- NS.spread/100
  forc <- c()
  j <- 0
  
  for(i in 1:floor((nrow(NS.spread)-training)/forc.horizon)){
    
    
    # test data
    y <- NS.spread[(1+j):(training+j) , maturity]
    f <- ind.v[(1+j):(training+j) , maturity]
        
    # auto- arima
    c <- auto.arima(y, xreg = f, test= "adf")

    
    # forecast
    e <- ind.v[(training+j+1):(training+j+forc.horizon) , maturity]
    h <- forecast(c, xreg = lagmatrix(e, -1))
    
    forc <- c(forc, list(h))
    
    j <- j + forc.horizon

  }
  
  return(forc)
}


a <- forecast.func(spread.NS.JPM, Forward.rate.JPM, 10, 600, 25)
lapply(a, plot)


Here's a link to my two datasets: https://drive.google.com/drive/folders/1goCxllYHQo3QJ0IdidKbdmfR-DZgrezN?usp=sharing

Agi
  • 73
  • 7

1 Answers1

0

LOOK AT THE END for a full functional example on how to handle AUTO.ARIMA MODEL with DAILY DATA using XREG and FOURIER SERIES with ROLLING STARTING TIMES and cross validated training and test.


  1. Without a reproducible example no one can help you, because they can't run your code. You need to provide data. :-(

  1. Even if it's not part of StackOverflow to discuss statistics matters, why don't you do an auto.arima with xreg instead of lm + auto.arima on residuals? Especially, considering how you forecast at the end, that training method looks really wrong. Consider using:
fit <- auto.arima(y, xreg = lagmatrix(f, -1))
h <- forecast(fit, xreg = lagmatrix(e, -1))

auto.arima will automatically calculate the best parameters by max likelihood.


  1. On your coding question..

forc <- c() should be outside of the for loop, otherwise at every run you delete your previous results.

Same for j <- 0: at every run you're setting it back to 0. Put it outside if you need to change its value at every run.

The output of forecast is an object of class forecast, which is actually a type of list. Therefore, you can't use cbind effectively.

I'm my opinion, you should create forc in this way: forc <- list()

And create a list of your final results in this way:

forc <- c(forc, list(h)) # instead of forc <- cbind(forc, h)

This will create a list of objects of class forecast.

You can then plot them with a for loop by getting access at every object or with a lapply.

lapply(output_of_your_function, plot)

This is as far as I can go without a reproducible example.


FINAL EDIT

FULL FUNCTIONAL EXAMPLE

Here I try to sum up a conclusion out of the million comments we wrote.

With the data you provided, I built a code that can handle everything you need.

From training and test to model, till forecast and finally plotting which have the X axis with the time as required in one of your comments.

I removed the for loop. lapply is much better for your case.

You can leave the fourier series if you want to. That's how Professor Hyndman suggests to handle daily time series.

Functions and libraries needed:

# libraries ---------------------------

library(forecast)
library(lubridate)


# run model -------------------------------------

.daily_arima_forecast <- function(init, training, horizon, tt, ..., K = 10){
    
    # create training and test
    tt_trn <- window(tt, start = time(tt)[init]           , end = time(tt)[init + training - 1])
    tt_tst <- window(tt, start = time(tt)[init + training], end = time(tt)[init + training + horizon - 1])
    
    # add fourier series [if you want to. Otherwise, cancel this part]
    fr  <- fourier(tt_trn[,1], K = K)
    frf <- fourier(tt_trn[,1], K = K, h = horizon)
    tsp(fr)  <- tsp(tt_trn)
    tsp(frf) <- tsp(tt_tst)
    tt_trn <- ts.intersect(tt_trn, fr)
    tt_tst <- ts.intersect(tt_tst, frf)
    colnames(tt_tst) <- colnames(tt_trn) <- c("y", "s", paste0("k", seq_len(ncol(fr))))
    
    # run model and forecast
    aa <- auto.arima(tt_trn[,1], xreg = tt_trn[,-1])
    fcst <- forecast(aa, xreg = tt_tst[,-1])
    
    # add actual values to plot them later!
    fcst$test.values <- tt_tst[,1]
    
    # NOTE: since I modified the structure of the class forecast I should create a new class,
    # but I didnt want to complicate your code
    fcst
    
}


daily_arima_forecast <- function(y, x, training, horizon, ...){
    
    # set up x and y together
    tt <- ts.intersect(y, x)
    
    # set up all starting point of the training set [give it a name to recognize them later]
    inits <- setNames(nm = seq(1, length(y) - training, by = horizon))
    
    # remove last one because you wouldnt have enough data in front of it
    inits <- inits[-length(inits)]
    
    # run model and return a list of all your models
    lapply(inits, .daily_arima_forecast, training = training, horizon = horizon, tt = tt, ...)
    
    
}


# plot ------------------------------------------

plot_daily_forecast <- function(x){
    
    autoplot(x) + autolayer(x$test.values)
    
}

Reproducible Example on how to use the previous functions

# create a sample data
tsp(EuStockMarkets) <- c(1991, 1991 + (1860-1)/365.25, 365.25)

# model
models <- daily_arima_forecast(y            = EuStockMarkets[,1], 
                               x            = EuStockMarkets[,2],
                               training     = 600, 
                               horizon      = 25, 
                               K            = 5)


# plot
plots <- lapply(models, plot_daily_forecast)

plots[[1]]

enter image description here

Example for the author of the post

# your data
load("BVIS0157_Forward.rda")
load("BVIS0157_NS.spread.rda")

spread.NS.JPM <- spread.NS.JPM / 100


# pre-work [out of function!!!]
set_up_ts <- function(m){
    
    start <- min(row.names(m))
    end   <- max(row.names(m))
    
    # daily sequence
    inds <- seq(as.Date(start), as.Date(end), by = "day")
    
    ts(m, start = c(year(start), as.numeric(format(inds[1], "%j"))), frequency = 365.25)
    
}

mts_spread.NS.JPM    <- set_up_ts(spread.NS.JPM)
mts_Forward.rate.JPM <- set_up_ts(Forward.rate.JPM)

# model
col <- 10
models <- daily_arima_forecast(y            = mts_spread.NS.JPM[, col], 
                               x            = stats::lag(mts_Forward.rate.JPM[, col], -1),
                               training     = 600, 
                               horizon      = 25, 
                               K            = 5) # notice that K falls between ... that goes directly to the inner function


# plot
plots <- lapply(models, plot_daily_forecast)

plots[[5]]

enter image description here

Edo
  • 7,567
  • 2
  • 9
  • 19
  • Regarding using the auto.arima function with xreg, I have tried that but I am not getting the same ARIMA(p,d,q) model which I am getting by putting the orders in the Arima function. I think it is because I run the auto.arima with the residuals. Also, my variables are stationary so I shouldn't get a d term but when using auto.arima directly as you mentioned, I am getting a d term. – Agi Aug 14 '20 at 16:58
  • with `auto.arima` you can force to apply NO differentiation: set the argument `d = 0` – Edo Aug 14 '20 at 17:01
  • They should put out the same output, if `lm`+`arima` and `auto.arima` are doing the same thing, but they don't. I am not sure if by using only `auto.arima` it runs the model for residuals. Because I run the `auto.arim` on the residuals and then put on the `Arima`. – Agi Aug 14 '20 at 17:07
  • They don't do the same thing indeed. check this out https://robjhyndman.com/hyndsight/arimax/ – Edo Aug 14 '20 at 17:14
  • It says they fit a regression with ARIMA error, which is initially what I am doing. – Agi Aug 14 '20 at 17:18
  • As I said in my answer, `auto.arima` calculates the parameters all together using ML. It doesn't do a sort of `lm` + `Arima` kind of thing.. If you need more information, I suggest you to ask for more details about this on https://stats.stackexchange.com/. That's the right place for statistical questions. – Edo Aug 14 '20 at 17:27
  • Except for the statistical matter, is the coding problem solved with my answer? – Edo Aug 14 '20 at 21:12
  • I still can't plot it. I get the values but even when I am doing: ```a <- forecast.func(spread.NS.JPM, Forward.rate.JPM, 10, 600, 25) lapply(a$x, plot) lapply(a$fitted, lines, col= "red") ``` it gives me the values up to the 600th value though it should give me up to 650, if I ran it for only two loops. – Agi Aug 17 '20 at 16:51
  • That code doesn't look right.. just run `lapply(a, plot)` and you'll have a proper forecast plot. The issue with 650 is not clear and I can't help you more without a reproducible example. – Edo Aug 17 '20 at 17:05
  • I am getting an error message when I run the ```lapply(a,plot)```. It is: Error in plot.window(...) : need finite 'ylim' values. I have the .rda files but I'm not sure how to share them. – Agi Aug 17 '20 at 17:14
  • Then I need you to create a reproducible example. https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Edo Aug 17 '20 at 17:15
  • Here is the link to the two datasets needed: https://drive.google.com/drive/folders/1goCxllYHQo3QJ0IdidKbdmfR-DZgrezN?usp=sharing – Agi Aug 17 '20 at 17:26
  • That's not a reproducible example. Please see the previous link. – Edo Aug 17 '20 at 17:32
  • I am not able to construct the datasets as the ones mentioned there. You can just run the two files in the R studio with the code. – Agi Aug 17 '20 at 18:08
  • Share the code where you read the files and call the function – Edo Aug 17 '20 at 18:40
  • If you just double click the files, it would open in R studio and you can see it in the environment of R. I added the line where I call the function. – Agi Aug 17 '20 at 18:48
  • What libraries do you use? – Edo Aug 17 '20 at 20:24
  • ```library("tsutils")``` and ```library("forecast")``` – Agi Aug 18 '20 at 16:00
  • I tried to run your code with your data, but `forecast.func` stops with an error in `solve.default(res$hessian * n.used, A): the system is numerically singular`. You need to review your code. Try to simplify your problem and make it work with a smaller sample of data. When you have a reproducible example, it would be easier for you to find the problem and to act upon it. – Edo Aug 18 '20 at 16:09
  • I ran it again and it runs fine, though doesn't store the data somehow. Thanks anyway. – Agi Aug 18 '20 at 23:28
  • I just noticed that I wrote `forc <- c(forc, h)` instead of `forc <- c(forc, list(h))`. Try it out – Edo Aug 19 '20 at 08:20
  • also.. what code are you running right now? is it the same but with those 2 or 3 corrections I wrote? – Edo Aug 19 '20 at 08:25
  • I had a look at your data with the interest to predict... And I think you should reconsider some aspects. Training on 600 points could be not enough on daily data because you're not getting even 2 cycles.. Second of all, you should have a look at this https://robjhyndman.com/hyndsight/dailydata/. auto.arima supported by fourier series could be good for you. I dont know why you want to lag by one day the other ts, but I guess you have your reasons for that. With autoarima / fourier series and 3 years training data, in the following 25 days you have decent predictions. I tried it out. – Edo Aug 19 '20 at 09:29
  • Thank you. This actually worked, though I want to put the real values and the prediction at the same plot. This way it only shows the forecasts. I know we can use ```autoplot(real_values)+autolayer(forecasted_values)``` to do but I am not sure how to apply it to the list. I tried ```lapply``` and it is not giving any plot. Another thing which I am trying to do is to make the x axis the dates rather than the indexes. – Agi Aug 20 '20 at 15:06
  • I have edited the code and it is what I am running right now. It includes the changes you suggested. – Agi Aug 20 '20 at 15:13
  • you left `forc <- cbind(forc, list(h))` instead of `forc <- c(forc, list(h))` – Edo Aug 20 '20 at 15:14
  • look at my final edit. I think at this point we are done here. – Edo Aug 20 '20 at 16:19
  • I tried to run it and I am getting an error just at the beginning. Other than that, the dates I have don't contain weekends and holidays, they are only trading days. Thank you anyways, I will try to autoplot it using the list I get. – Agi Aug 20 '20 at 19:18
  • I ran it with your data and I have no error. Clean up your environment before running it again. Take the latest edit I did because I made a few. For the issue related to the missing weekends, you need to adjust the frequency and the code will still be fine. If you think the answer is what you were looking for, you should selected it, so to let other users know. They may find it more useful this way. – Edo Aug 20 '20 at 19:26