1

I'm trying to view the out of sample performance scores after running fable prophet. Please note, the forecast is grouped based on type and the forecast is looking 5 observations ahead.

Here is the code:

library(tibble)
library(tsibble)
library(fable.prophet)

lax_passengers <- read.csv("https://raw.githubusercontent.com/mitchelloharawild/fable.prophet/master/data-raw/lax_passengers.csv")


library(dplyr)
library(lubridate)
lax_passengers <- lax_passengers %>%
  mutate(datetime = mdy_hms(ReportPeriod)) %>%
  group_by(month = yearmonth(datetime), type = Domestic_International) %>%
  summarise(passengers = sum(Passenger_Count)) %>%
  ungroup()

lax_passengers <- as_tsibble(lax_passengers, index = month, key = type)
fit <- lax_passengers %>% 
  model(
    mdl = prophet(passengers ~ growth("linear") + season("year", type = "multiplicative")),
  )
fit

test_tr <- lax_passengers %>%
  slice(1:(n()-5)) %>%
  stretch_tsibble(.init = 12, .step = 1)


fc <- test_tr %>%
  model(
    mdl = prophet(passengers ~ growth("linear") + season("year", type = "multiplicative")),
  ) %>%
  forecast(h = 5)


fc %>% accuracy(lax_passengers)

When I run fc %>% accuracy(lax_passenger), I get the following warning:

Warning message:
The future dataset is incomplete, incomplete out-of-sample data will be treated as missing. 
5 observations are missing between 2019 Apr and 2019 Aug 

How do make the future dataset complete as I believe the performance score isn't accurate based on the missing 5 observations.

It seems like when I try to stretch the tsibble, it doesn't slice correctly as it doesn't remove the last 5 observations from each type.

QMan5
  • 713
  • 1
  • 4
  • 20

1 Answers1

1

The slice() function removes rows from the entire dataset, so it is only removing the last 5 rows from your last key (type=="International"). To remove the last 5 rows from all keys, you'll need to group by keys and slice.

test_tr <- lax_passengers %>%
  group_by_key() %>% 
  slice(1:(n()-5)) %>%
  ungroup() %>% 
  stretch_tsibble(.init = 12, .step = 1)
  • This is exactly what I was looking for I wasn't a ware that you could group by keys. Thank you! – QMan5 Nov 30 '22 at 16:48
  • 1
    `group_by_key()` is just a shortcut for `group_by(type)` here. But yes, most operations in dplyr won't default to being applied to each key unless it is grouped. – Mitchell O'Hara-Wild Nov 30 '22 at 21:58