I'm struggling to understand the correct way to calculate 1-step forecast errors over a test set using the {fable}
package for R.
First, my understanding of a 1-step forecast error is that we:
- at time t, we make a prediction for the next time period, t+1
- take observation for t+1, calculate the residual, etc
- make a forecast for t+2 based on the new data we observed at t+1 (using the same model/coefs that we did at time t)
- repeat steps 1-3 over the test set.
I've had a look at this post: Does the forecast function within fable provide one-step forecasts? which aligns with the method discussed in FPP3, but I'm getting results which do not match my intuition. This method also fails when the test set is smaller than the seasonality window of the fitted models. Additionally, what do you do when there is no refit()
method available for a forecast algorithm? (such as the provided THETA()
technique).
Below is a code snipped with 2 different ways I believe should calculate 1-step forecast errors over a 24m test set after a model has been fit. The first is based on the method in the post linked above and the second is a loop that I came up with.
The accuracy and forecasts are produced for both methods, but they differ by a not-insignificant amount. Why are they different? Which one is correct?
In general, if I have a model and make a prediction for t+1, observe y_{t+1}, it does not feel clear how I would use the trained model and the new observation to make a prediction for t+2.
# Init
library(tidyverse)
library(fable)
#> Loading required package: fabletools
# Prepare data
us_accidental_deaths <- as_tsibble(USAccDeaths)
deaths_train <- head(us_accidental_deaths, -24)
deaths_test <- tail(us_accidental_deaths, 24)
# Get models on training data
deaths_fits_0 <- deaths_train %>%
model(ets = ETS(value))
##############
# Normal way #
##############
# 'refit' without estimating new params/coefs on the new data
deaths_fits_1 <- deaths_fits_0 %>%
refit(deaths_test, reestimate = FALSE)
# what happens here if the test set is smaller than the seasonality windows?
# 1-step forecast accuracy on the test set?
deaths_fits_1 %>%
accuracy()
#> # A tibble: 1 x 10
#> .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 ets Training 30.4 142. 111. 0.334 1.24 0.428 0.416 -0.105
# The 1-step forecasts?
fitted(deaths_fits_1)
#> # A tsibble: 24 x 3 [1M]
#> # Key: .model [1]
#> .model index .fitted
#> <chr> <mth> <dbl>
#> 1 ets 1977 Jan 7770.
#> 2 ets 1977 Feb 6906.
#> 3 ets 1977 Mar 7680.
#> 4 ets 1977 Apr 8135.
#> 5 ets 1977 May 8964.
#> 6 ets 1977 Jun 9264.
#> 7 ets 1977 Jul 10457.
#> 8 ets 1977 Aug 9547.
#> 9 ets 1977 Sep 8520.
#> 10 ets 1977 Oct 8660.
#> # ... with 14 more rows
################
# Using a loop #
################
# Initialise fit
death_fits_2 <- deaths_fits_0
# List to store 1 step forecasts in
test_forecasts_list <- list()
# Begin loop
for (i in 1:length(unique(deaths_test$index))){
# 1 step forecast
one_step_fc <- death_fits_2 %>%
forecast(h = 1)
# store
test_forecasts_list[[i]] <- one_step_fc
# refit using the newly observed datapoint
death_fits_2 <- deaths_fits_0 %>%
refit(
bind_rows(
deaths_train,
deaths_test %>%
arrange(index) %>%
slice_head(n=i)
),
reestimate=FALSE
)
}
test_forecasts_2 <- bind_rows(test_forecasts_list)
# 1-step forecast accuracy over test set
test_forecasts_2 %>%
accuracy(data=us_accidental_deaths)
#> # A tibble: 1 x 10
#> .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 ets Test 37.0 255. 194. 0.308 2.19 0.352 0.380 -0.305
# 1-step forecasts
test_forecasts_2
#> # A fable: 24 x 4 [1M]
#> # Key: .model [1]
#> .model index value .mean
#> <chr> <mth> <dist> <dbl>
#> 1 ets 1977 Jan N(7837, 110265) 7837.
#> 2 ets 1977 Feb N(7129, 95555) 7129.
#> 3 ets 1977 Mar N(7783, 93292) 7783.
#> 4 ets 1977 Apr N(7912, 91760) 7912.
#> 5 ets 1977 May N(8894, 89411) 8894.
#> 6 ets 1977 Jun N(9418, 87442) 9418.
#> 7 ets 1977 Jul N(10072, 85241) 10072.
#> 8 ets 1977 Aug N(9891, 89234) 9891.
#> 9 ets 1977 Sep N(8388, 93841) 8388.
#> 10 ets 1977 Oct N(8679, 91768) 8679.
#> # ... with 14 more rows
Created on 2022-05-31 by the reprex package (v2.0.1)