Issue:
Using fable I can easily produce forecasts on a time series with a grouped structure, and can even use Fable's aggregate_key
/ reconcile
syntax to produce a coherent top-level forecast. However I'm unable to easily access the aggregate forecasts using this method, and the alternative I'm using involves abandoning the fable (forecast table) structure. Can anyone tell me if there's an easier/intended way to do this using the package? As you can see in the examples, I'm able to get there using other methods, but I'd like to know if there's a better way. Any help gratefully received!
Approach 1:
My efforts to summarise the forecast without using aggregate_key
/ reconcile
have been mainly using dplyr's group_by
and summarise
, however the prediction interval for the forecast is formatted as a normal distribution object, which doesn't seem to support summing using this method. To get around this I've been using hilo
and unpack_hilo
to extract bounds for different prediction intervals, which can then be summed using the usual method. However I'd really like to retain the fable structure and the distribution objects, which is impossible using this method.
Approach 2:
The alternative, using aggregate_key
/ reconcile
only seems to support aggregation using min_trace
. I understand that this method is for optimum reconciliation, whereas what I want is a simple bottom-up aggregate forecast. It feels like there should be an easy way to get bottom-up forecasts using this syntax, but I haven't found one so far. Moreover, even using min_trace
I'm unsure how to access the aggregate forecast itself as you can see in the example!
Example using approach 1:
library(fable)
#> Loading required package: fabletools
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
lung_deaths_agg <- as_tsibble(cbind(mdeaths, fdeaths))
fc_1 <- lung_deaths_agg %>%
model(lm = TSLM(value ~ trend() + season())) %>%
forecast()
fc_1
#> # A fable: 48 x 5 [1M]
#> # Key: key, .model [2]
#> key .model index value .mean
#> <chr> <chr> <mth> <dist> <dbl>
#> 1 fdeaths lm 1980 Jan N(794, 5940) 794.
#> 2 fdeaths lm 1980 Feb N(778, 5940) 778.
#> 3 fdeaths lm 1980 Mar N(737, 5940) 737.
#> 4 fdeaths lm 1980 Apr N(577, 5940) 577.
#> 5 fdeaths lm 1980 May N(456, 5940) 456.
#> 6 fdeaths lm 1980 Jun N(386, 5940) 386.
#> 7 fdeaths lm 1980 Jul N(379, 5940) 379.
#> 8 fdeaths lm 1980 Aug N(335, 5940) 335.
#> 9 fdeaths lm 1980 Sep N(340, 5940) 340.
#> 10 fdeaths lm 1980 Oct N(413, 5940) 413.
#> # ... with 38 more rows
fc_1 %>%
hilo() %>%
unpack_hilo(c(`80%`, `95%`)) %>%
as_tibble() %>%
group_by(index) %>%
summarise(across(c(.mean, ends_with("upper"), ends_with("lower")), sum))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 24 x 6
#> index .mean `80%_upper` `95%_upper` `80%_lower` `95%_lower`
#> <mth> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1980 Jan 2751. 3089. 3267. 2414. 2236.
#> 2 1980 Feb 2687. 3024. 3202. 2350. 2171.
#> 3 1980 Mar 2535. 2872. 3051. 2198. 2020.
#> 4 1980 Apr 2062. 2399. 2577. 1725. 1546.
#> 5 1980 May 1597. 1934. 2113. 1260. 1082.
#> 6 1980 Jun 1401. 1738. 1916. 1064. 885.
#> 7 1980 Jul 1343. 1680. 1858. 1006. 827.
#> 8 1980 Aug 1200. 1538. 1716. 863. 685.
#> 9 1980 Sep 1189. 1527. 1705. 852. 674.
#> 10 1980 Oct 1482. 1819. 1998. 1145. 967.
#> # ... with 14 more rows
Example using approach 2:
fc_2 <- lung_deaths_agg %>%
aggregate_key(key, value = sum(value)) %>%
model(lm = TSLM(value ~ trend() + season())) %>%
reconcile(lm = min_trace(lm)) %>%
forecast()
fc_2
#> # A fable: 72 x 5 [1M]
#> # Key: key, .model [3]
#> key .model index value .mean
#> <chr> <chr> <mth> <dist> <dbl>
#> 1 fdeaths lm 1980 Jan N(794, 5606) 794.
#> 2 fdeaths lm 1980 Feb N(778, 5606) 778.
#> 3 fdeaths lm 1980 Mar N(737, 5606) 737.
#> 4 fdeaths lm 1980 Apr N(577, 5606) 577.
#> 5 fdeaths lm 1980 May N(456, 5606) 456.
#> 6 fdeaths lm 1980 Jun N(386, 5606) 386.
#> 7 fdeaths lm 1980 Jul N(379, 5606) 379.
#> 8 fdeaths lm 1980 Aug N(335, 5606) 335.
#> 9 fdeaths lm 1980 Sep N(340, 5606) 340.
#> 10 fdeaths lm 1980 Oct N(413, 5606) 413.
#> # ... with 62 more rows
fc_2 %>% as_tibble() %>% select(key) %>% slice(50:55)
#> # A tibble: 6 x 1
#> key
#> <chr>
#> 1 <aggregated>
#> 2 <aggregated>
#> 3 <aggregated>
#> 4 <aggregated>
#> 5 <aggregated>
#> 6 <aggregated>
fc_2 %>% as_tibble() %>% select(key) %>% filter(key == "<aggregated>")
#> # A tibble: 0 x 1
#> # ... with 1 variable: key <chr>