Assuming that the question is asking how to form the sequence of start, end and start-of-testing (oos) dates given st
and en
shown below, first form the months
sequence and then transform it to append the start-of-test date. To do that seq
can generate a beginning of month Date
sequence. Also if we add an integer to a Date
class object then the result is to add or subtract that number of days so we can get the end of the month by subtracting one day from the start of the next month.
We have allocated 70% of the three month period to training and 30% to the test making use of the fact that the difference between two Date
objects is the number of days between them. 70/30 is what the question asks for; however, that means that there will be a few days not in any test in each period whereas the diagram has no days that are not in any test except at the beginning. If all days are to be in a test then we might instead use the third month in the period as the test period and the first two months as the training period. In that case uncomment the commented out transform
line. We also show this variation at the end.
Finally define a function f
(we have shown a dummy calculation to make it possible to run the code) with arguments start
, end
and test
to perform whatever calculation is needed. It can produce any sort of output object for one train/test instance. We can use either Map
or by
as shown below. The output list of results will have one component per row of d
.
# input
st <- as.Date("2019-01-01")
en <- as.Date("2019-12-31")
months <- seq(st, en, by = "month")
d <- data.frame(start = head(months, -2), end = c(tail(months, -3) - 1, en))
# append date that test starts -- d shown at end
d <- transform(d, test = start + .7 * (end - start + 1))
# d <- transform(d, test = tail(months, -2))
# replace this with your function. Can be many lines.
f <- function(start, end, test) {
data.frame(start, end, test) # dummy calc - just show dates
}
# use `Map` or `by` to run f nrow(d) times giving a list of results,
# one component per row of d
with(d, Map(f, start, end, test))
# or
by(d, 1:nrow(d), with, f(start, end, test))
The data frame d
above is:
> d
start end test
1 2019-01-01 2019-03-31 2019-03-05
2 2019-02-01 2019-04-30 2019-04-04
3 2019-03-01 2019-05-31 2019-05-04
4 2019-04-01 2019-06-30 2019-06-04
5 2019-05-01 2019-07-31 2019-07-04
6 2019-06-01 2019-08-31 2019-08-04
7 2019-07-01 2019-09-30 2019-09-03
8 2019-08-01 2019-10-31 2019-10-04
9 2019-09-01 2019-11-30 2019-11-04
10 2019-10-01 2019-12-31 2019-12-04
If we had used the commented out version of d
then it would look like this (same except last column):
start end test
1 2019-01-01 2019-03-31 2019-03-01
2 2019-02-01 2019-04-30 2019-04-01
3 2019-03-01 2019-05-31 2019-05-01
4 2019-04-01 2019-06-30 2019-06-01
5 2019-05-01 2019-07-31 2019-07-01
6 2019-06-01 2019-08-31 2019-08-01
7 2019-07-01 2019-09-30 2019-09-01
8 2019-08-01 2019-10-31 2019-10-01
9 2019-09-01 2019-11-30 2019-11-01
10 2019-10-01 2019-12-31 2019-12-01
Graphics
We can display these as gantt charts using ggplot2.
library(ggplot2)
library(gridExtra)
library(scales)
n <- nrow(d)
Plot <- function(x, main) {
ggplot(x, aes(size = I(15))) +
geom_segment(aes(x = start, xend = test, y = n:1, yend = n:1), col = "green") +
geom_segment(aes(x = test, xend = end, y = n:1, yend = n:1), col = "blue") +
scale_x_date(labels = date_format("%b\n%Y"), breaks = date_breaks("month")) +
ggtitle(main) +
theme(legend.position = "none",
axis.title = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
panel.grid.major = element_line(colour = "#808080"))
}
d <- transform(d, test = start + .7 * (end - start + 1))
g1 <- Plot(d, "70/30")
d <- transform(d, test = tail(months, -2))
g2 <- Plot(d, "2 months/1 month")
grid.arrange(g1, g2, ncol = 2)
