-4

I need to perform walk forward optimization on a time series. The attached image shows a diagram of how this should be done. I have to perform my data processing function on each period, the number of periods I have to adjust to a variable (for example: I assign a start and end date and each period in the test should be 1 month). My problem is as follows: I do not know how to shift the dates by the value of the out-of-sample period and get a sheet with the results of calculations for each period at the output of the function. The value of the out-of-sample period will be 30% of the total length of the selected period. What tools in R can I use to solve my problem?enter image description here

start date: 2019-01-01, end date: 2019-12-31

  1. first period: from 2019-01-01 to 2019-03-31
  2. second period: from 2019-02-01 to 2019-04-30 etc...
Ahmad
  • 274
  • 2
  • 15
  • 4
    It will be extremely challenging to answer your question without 1) at least a sample of your data, 2) the code you have tried thus far, and 3) your expected output. To address #1, please provide the output of `dput(data)` or `dput(head(data))` if your data is very large. Paste the output directly into your question by pressing the [edit] button. See [How to Make a Great R Reproducible Example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for more info. – Ian Campbell Jan 01 '21 at 22:37
  • yes, my example and my proccesing function large. For me, the most important thing is to implement the shift of dates by a given size – Михаил Табаков Jan 01 '21 at 23:03
  • Even with additional insight from the question asker's self-answer, I think the question remains unclear. If the question was edited, I would certainly be willing to reconsider. – Ian Campbell Jan 02 '21 at 07:21

2 Answers2

1

Assuming that the question is asking how to form the sequence of start, end and start-of-testing (oos) dates given st and en shown below, first form the months sequence and then transform it to append the start-of-test date. To do that seq can generate a beginning of month Date sequence. Also if we add an integer to a Date class object then the result is to add or subtract that number of days so we can get the end of the month by subtracting one day from the start of the next month.

We have allocated 70% of the three month period to training and 30% to the test making use of the fact that the difference between two Date objects is the number of days between them. 70/30 is what the question asks for; however, that means that there will be a few days not in any test in each period whereas the diagram has no days that are not in any test except at the beginning. If all days are to be in a test then we might instead use the third month in the period as the test period and the first two months as the training period. In that case uncomment the commented out transform line. We also show this variation at the end.

Finally define a function f (we have shown a dummy calculation to make it possible to run the code) with arguments start, end and test to perform whatever calculation is needed. It can produce any sort of output object for one train/test instance. We can use either Map or by as shown below. The output list of results will have one component per row of d.

# input
st <- as.Date("2019-01-01")
en <- as.Date("2019-12-31")

months <- seq(st, en, by = "month")
d <- data.frame(start = head(months, -2), end = c(tail(months, -3) - 1, en))
# append date that test starts -- d shown at end
d <- transform(d, test = start + .7 * (end - start + 1))
# d <- transform(d, test = tail(months, -2))

# replace this with your function.  Can be many lines.
f <- function(start, end, test) {
  data.frame(start, end, test) # dummy calc - just show dates
}

# use `Map` or `by` to run f nrow(d) times giving a list of results, 
# one component per row of d
with(d, Map(f, start, end, test))
# or
by(d, 1:nrow(d), with, f(start, end, test))

The data frame d above is:

> d
        start        end       test
1  2019-01-01 2019-03-31 2019-03-05
2  2019-02-01 2019-04-30 2019-04-04
3  2019-03-01 2019-05-31 2019-05-04
4  2019-04-01 2019-06-30 2019-06-04
5  2019-05-01 2019-07-31 2019-07-04
6  2019-06-01 2019-08-31 2019-08-04
7  2019-07-01 2019-09-30 2019-09-03
8  2019-08-01 2019-10-31 2019-10-04
9  2019-09-01 2019-11-30 2019-11-04
10 2019-10-01 2019-12-31 2019-12-04

If we had used the commented out version of d then it would look like this (same except last column):

        start        end       test
1  2019-01-01 2019-03-31 2019-03-01
2  2019-02-01 2019-04-30 2019-04-01
3  2019-03-01 2019-05-31 2019-05-01
4  2019-04-01 2019-06-30 2019-06-01
5  2019-05-01 2019-07-31 2019-07-01
6  2019-06-01 2019-08-31 2019-08-01
7  2019-07-01 2019-09-30 2019-09-01
8  2019-08-01 2019-10-31 2019-10-01
9  2019-09-01 2019-11-30 2019-11-01
10 2019-10-01 2019-12-31 2019-12-01

Graphics

We can display these as gantt charts using ggplot2.

library(ggplot2)
library(gridExtra)
library(scales)

n <- nrow(d)

Plot <- function(x, main) {
  ggplot(x, aes(size = I(15))) +
    geom_segment(aes(x = start, xend = test, y = n:1, yend = n:1), col = "green") +
    geom_segment(aes(x = test, xend = end, y = n:1, yend = n:1), col = "blue") +
    scale_x_date(labels = date_format("%b\n%Y"), breaks = date_breaks("month")) +
    ggtitle(main) +
    theme(legend.position = "none",
      axis.title = element_blank(),
      axis.text.y = element_blank(),
      axis.ticks.y = element_blank(),
      panel.grid.major = element_line(colour = "#808080"))

}

d <- transform(d, test = start + .7 * (end - start + 1))
g1 <- Plot(d, "70/30")

d <- transform(d, test = tail(months, -2))
g2 <- Plot(d, "2 months/1 month")

grid.arrange(g1, g2, ncol = 2)

screenshot

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
-2

Thanks everyone for the help. i found a way to solve by writing a small function.

dates <- function(startDate, endDate, periodLength, lag){
  start <- as.Date(startDate)
  end <- as.Date(endDate)
  data <- start
  while(data[length(data)] < end){
    x <- as.Date(data[length(data)] + lag)
    data <- as.Date(rbind(data, x))
  }
  end <- data + periodLength
  data <- data.table(data, end)
  colnames(data) <- c('start', 'end')
  data$start <- as.Date(data$start)
  data$end <- as.Date(data$end)
  data <- as.list(as.data.table(t(data)))
  return(data)
}

where

startDate - this is the start date of the testing period, endDate - this is the end date of the testing period, periodLength - this is the length of one period in days, lag - this is the offset (the length of the OOS period)

dates(startDate = '2019-01-01', endDate = '2019-06-30', periodLength = 30, lag = 10)

$V1
[1] "2019-01-01" "2019-01-31"

$V2
[1] "2019-01-11" "2019-02-10"

$V3
[1] "2019-01-21" "2019-02-20"

$V4
[1] "2019-01-31" "2019-03-02"

$V5
[1] "2019-02-10" "2019-03-12"

$V6
[1] "2019-02-20" "2019-03-22"

$V7
[1] "2019-03-02" "2019-04-01"

$V8
[1] "2019-03-12" "2019-04-11"

$V9
[1] "2019-03-22" "2019-04-21"

$V10
[1] "2019-04-01" "2019-05-01"

$V11
[1] "2019-04-11" "2019-05-11"

$V12
[1] "2019-04-21" "2019-05-21"

$V13
[1] "2019-05-01" "2019-05-31"

$V14
[1] "2019-05-11" "2019-06-10"

$V15
[1] "2019-05-21" "2019-06-20"

$V16
[1] "2019-05-31" "2019-06-30"

$V17
[1] "2019-06-10" "2019-07-10"

$V18
[1] "2019-06-20" "2019-07-20"

$V19
[1] "2019-06-30" "2019-07-30"

enter image description here