Coerce multiple objects to time series objects with different start/end dates

Question

I am following this tutorial using the sweep package to perform tidy time series forecasting for groups of time series. Sweep extends the broom package to tidy forecast objects.

Tutorial here: https://rdrr.io/cran/sweep/f/vignettes/SW01_Forecasting_Time_Series_Groups.Rmd

Problem: the time series in my data contain different lengths and start dates. In the tutorial, a fixed start is passed through to tk_ts() because the each time series has the same start and end date:

monthly_qty_by_cat2_ts <- monthly_qty_by_cat2_nest %>%
mutate(data.ts = map(.x       = data.tbl, 
                     .f       = tk_ts, 
                     select   = -order.month, 
                     start    = 2011, # <- see the fixed start date here
                     freq     = 12))

Question: How do I create a list column of time series objects using map like the example above (and in the tutorial) BUT include the correct start date and end date for each series (which is different for each series)

Packages:

library(tidyquant)
library(sweep)
library(timetk)
library(forecast)
library(tidyverse)

Reproducible Sample Data:

df <- structure(list(id = c("series_1", "series_1", "series_1", "series_1", 
"series_1", "series_1", "series_1", "series_1", "series_1", "series_1", 
"series_1", "series_1", "series_2", "series_2", "series_2", "series_2", 
"series_2", "series_2", "series_2", "series_2", "series_2", "series_2", 
"series_2", "series_2", "series_2", "series_2", "series_2", "series_2", 
"series_2", "series_2", "series_2", "series_2", "series_2", "series_2", 
"series_2", "series_2", "series_3", "series_3", "series_3", "series_3", 
"series_3", "series_3", "series_3", "series_3", "series_3", "series_3", 
"series_3", "series_3", "series_3", "series_3", "series_3", "series_3", 
"series_3", "series_3", "series_3", "series_3", "series_3", "series_3", 
"series_3", "series_3", "series_3", "series_3", "series_3", "series_3", 
"series_3", "series_3", "series_3", "series_3", "series_3", "series_3", 
"series_3", "series_3"), date = structure(c(10957, 10988, 11017, 
11048, 11078, 11109, 11139, 11170, 11201, 11231, 11262, 11292, 
13787, 13818, 13848, 13879, 13910, 13939, 13970, 14000, 14031, 
14061, 14092, 14123, 14153, 14184, 14214, 14245, 14276, 14304, 
14335, 14365, 14396, 14426, 14457, 14488, 15706, 15737, 15765, 
15796, 15826, 15857, 15887, 15918, 15949, 15979, 16010, 16040, 
16071, 16102, 16130, 16161, 16191, 16222, 16252, 16283, 16314, 
16344, 16375, 16405, 16436, 16467, 16495, 16526, 16556, 16587, 
16617, 16648, 16679, 16709, 16740, 16770), class = "Date"), value = c(0.526816892903298, 
0.0640646643005311, 0.569032567087561, 0.733993547270074, 0.742038151714951, 
0.273655793862417, 0.167404572479427, 0.766059899237007, 0.60176682821475, 
0.0769246644340456, 0.162491872673854, 0.323168716160581, 0.179594057612121, 
1.096650313586, 0.894524970557541, 1.55353087605909, 1.50662920810282, 
1.06641945429146, 1.95049989689142, 0.226111006457359, 0.644822218455374, 
0.998987099621445, 0.303691457025707, 0.782052680384368, 1.59218573896214, 
0.171859007328749, 1.9222901831381, 1.4127164632082, 0.919900813139975, 
1.93520273640752, 0.00968976970762014, 0.204170028213412, 1.90123205445707, 
1.05964627675712, 1.40747981145978, 0.476186634972692, 1.56826665904373, 
0.106335987104103, 2.7993093256373, 1.07078968570568, 0.668198951287195, 
0.584522894583642, 0.753677956061438, 2.76492932089604, 2.17496411106549, 
2.56561762047932, 0.586419345578179, 1.7261581714265, 1.38705582660623, 
0.708714888431132, 1.91359720285982, 1.85413848585449, 1.85429209470749, 
2.18856360157952, 1.00432092184201, 0.588805445702747, 2.95583719946444, 
0.382465981179848, 0.711439447710291, 1.24924974096939, 0.961857272777706, 
2.26519317110069, 1.10985011514276, 0.938654307508841, 0.985875837039202, 
1.13028976111673, 2.90536748478189, 0.795255574397743, 1.4741945641581, 
2.02167924796231, 1.2093570465222, 1.47486943169497)), .Names = c("id", 
"date", "value"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-72L))

After Nesting:

df_nest <- df %>% group_by(id) %>% 
  nest(.key = data.tbl)

From here I would like to apply some function to mutate a new list column that contains the same data from data.tbl like in the example above (and in the tutorial) coerced to a ts object (in order to be used with the forecast package) but with the correct start and end date for each series.

I want to apply something like this:

df_ts <- df_nest %>%
  mutate(data.ts = map(.x = data.tbl,
                       .f = tk_ts,
                       select = -date,
                       start = c(2000, 1), # <- Problem HERE
                       freq = 12))

But the problem is that this only gives the correct start date for series_1.

How do I mutate this new list column of ts objects with the correct start and end dates for each series?

Thanks

score 2 · Accepted Answer · answered Aug 11 '17 at 12:48

Using format() to extract year and month as start:

df_ts_2 <- df_nest %>%
  mutate(data.ts = map(.x = data.tbl,
                       .f = function(data) tk_ts(
                         data, 
                         select = -date, 
                         start = as.integer(c(format(data$date[1], "%Y"), format(data$date[1], "%m"))),
                         freq = 12
                       )))

print(df_ts_2$data.ts)

# [[1]]
#             Jan        Feb        Mar        Apr        May        Jun        Jul        Aug        Sep        Oct        Nov        Dec
# 2000 0.52681689 0.06406466 0.56903257 0.73399355 0.74203815 0.27365579 0.16740457 0.76605990 0.60176683 0.07692466 0.16249187 0.32316872
# 
# [[2]]
#             Jan        Feb        Mar        Apr        May        Jun        Jul        Aug        Sep        Oct        Nov        Dec
# 2007                                                                                                    0.17959406 1.09665031 0.89452497
# 2008 1.55353088 1.50662921 1.06641945 1.95049990 0.22611101 0.64482222 0.99898710 0.30369146 0.78205268 1.59218574 0.17185901 1.92229018
# 2009 1.41271646 0.91990081 1.93520274 0.00968977 0.20417003 1.90123205 1.05964628 1.40747981 0.47618663                                 
# 
# [[3]]
#            Jan       Feb       Mar       Apr       May       Jun       Jul       Aug       Sep       Oct       Nov       Dec
# 2013 1.5682667 0.1063360 2.7993093 1.0707897 0.6681990 0.5845229 0.7536780 2.7649293 2.1749641 2.5656176 0.5864193 1.7261582
# 2014 1.3870558 0.7087149 1.9135972 1.8541385 1.8542921 2.1885636 1.0043209 0.5888054 2.9558372 0.3824660 0.7114394 1.2492497
# 2015 0.9618573 2.2651932 1.1098501 0.9386543 0.9858758 1.1302898 2.9053675 0.7952556 1.4741946 2.0216792 1.2093570 1.4748694

Coerce multiple objects to time series objects with different start/end dates

1 Answers1