Time Series and Forecasting

Question

My raw data is Date, A:z. I need to have each column/vector A:z as independent ts() time series. So I can run auto.armia and forecast functions on each Vector. I can read in successfully use seq_along to create seperate df A:Z in my global environment. My trouble now is looping through each df and turning them into time series, followed by looping through each with auto.armia, and forecast functions. End result should be a df with point forecast for each A:Z for time I specify (1 year or 5 years out) I want to set the forecast criteria for number of next periods as a variable.

    structure(list(YEAR = c(2001, 2002, 2003, 2004, 2005, 2006), 
A = c(0, 0, 0, 2003, 0, 0), B = c(0, 0, 0, 2004, 0, 0), C = c(0, 
0, 0, 2005, 0, 0), D = c(0, 0, 0, 2006, 0, 0), E = c(0, 0, 
0, 2007, 0, 0), F = c(0, 0, 0, 2008, 0, 0), G = c(0, 0, 0, 
2009, 0, 2310593.63), H = c(0, 0, 0, 2010, 0, 949885.17), 
I = c(0, 0, 0, 2011, 51939.35, 755167.32), J = c(0, 0, 0, 
2012, 200485.83, 0), K = c(0, 0, 0, 2013, 340741.25, 0), 
L = c(0, 0, 0, 2014, 692627.39, 0), M = c(0, 0, 0, 2015, 
498738.38, 13228.06), N = c(0, 0, 0, 2016, 727855.33, 151441.77
), O = c(0, 0, 0, 2017, 1197076.02, 108188.58), P = c(0, 
0, 0, 2018, 558267.98, 0), Q = c(0, 0, 0, 2019, 631624.18, 
0), R = c(0, 0, 0, 2020, 1348869.22, 0), S = c(0, 0, 0, 2021, 
1206861.95, 0), T = c(0, 0, 0, 2022, 0, 0), U = c(0, 0, 0, 
2023, 0, 0), V = c(0, 0, 0, 2024, 0, 0), W = c("0", "0", 
"0", "Grand Total", "7455086.88", "4288504.53"), X = c(0, 
0, 0, 2011, 51939.35, 755167.32), Y = c(0, 0, 0, 2012, 200485.83, 
0), Z = c(0, 0, 0, 2013, 340741.25, 0)), row.names = c(NA,

6L), class = "data.frame")

Can you paste the output of `dput(head(raw_data))` Also see https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — langtang, Feb 21 '22 at 23:32
sorry, I don't understand this structure or the question at all. Column W has string values `"0","0"."0","Grand Total", "7455086.88", "4288504.53"` What does that even mean?, Rows 1, 2, and 3 are all zeroes. Row 4 looks like a combination of numbers (years?) and then text ("Grand Total") and more numbers. Sorry, your question needs much more specificity — langtang, Feb 22 '22 at 14:20
There is no "Grand Total" Just a DF with 27 columns/vectors that are independent of each other with values for each year 2001-2021. Column 1 is YEAR. I want to turn each column/vector into time series or entire data frame into time series so I can forecast the next nth number for each column/vector. Currently 26 but good be more in future. Model needs to accept any number of columns/vectors and years — Kyle Overton, Feb 22 '22 at 16:37
sorry, I can't be of much help here.. If you drop column "W", which does have "Grand Total" in it, you can then melt and split like this: `split(data.table::melt(setDT(data)[,!c("W")],id="YEAR"), by="variable")`, which will give you 25 data frames, one for each of your columns A through Z, excluding W. — langtang, Feb 22 '22 at 16:54
I can split fine, just having trouble applying ts() or a time series to each df after split. — Kyle Overton, Feb 22 '22 at 17:01

langtang · Accepted Answer · 2022-02-22T19:00:57.740

Here are some options to generate separate time series for each column:

If your structure above is data, then you can do the following

Option 1: Set as data.table and apply `ts()` to each column A-Z

library(data.table)
setDT(data)
dt_as_ts = data[, lapply(.SD, ts, start=2001, end=2006), .SDcols=c(2:27)]

This will return a data.table where each column is an object of class "ts". Output:

      A    B    C    D    E    F       G        H         I        J        K        L         M        N         O      P        Q       R       S    T    U    V           W         X        Y        Z
   <ts> <ts> <ts> <ts> <ts> <ts>    <ts>     <ts>      <ts>     <ts>     <ts>     <ts>      <ts>     <ts>      <ts>   <ts>     <ts>    <ts>    <ts> <ts> <ts> <ts>        <ts>      <ts>     <ts>     <ts>
1:    0    0    0    0    0    0       0      0.0      0.00      0.0      0.0      0.0      0.00      0.0       0.0      0      0.0       0       0    0    0    0           0      0.00      0.0      0.0
2:    0    0    0    0    0    0       0      0.0      0.00      0.0      0.0      0.0      0.00      0.0       0.0      0      0.0       0       0    0    0    0           0      0.00      0.0      0.0
3:    0    0    0    0    0    0       0      0.0      0.00      0.0      0.0      0.0      0.00      0.0       0.0      0      0.0       0       0    0    0    0           0      0.00      0.0      0.0
4: 2003 2004 2005 2006 2007 2008    2009   2010.0   2011.00   2012.0   2013.0   2014.0   2015.00   2016.0    2017.0   2018   2019.0    2020    2021 2022 2023 2024 Grand Total   2011.00   2012.0   2013.0
5:    0    0    0    0    0    0       0      0.0  51939.35 200485.8 340741.2 692627.4 498738.38 727855.3 1197076.0 558268 631624.2 1348869 1206862    0    0    0  7455086.88  51939.35 200485.8 340741.2
6:    0    0    0    0    0    0 2310594 949885.2 755167.32      0.0      0.0      0.0  13228.06 151441.8  108188.6      0      0.0       0       0    0    0    0  4288504.53 755167.32      0.0      0.0

You can predict the next value for each of these (except W, its not numeric), like this:

t(dt_of_ts[,lapply(.SD, function(x) predict(arima(x))), .SDcols=-23])

Option 2: Just apply `ts()` directly to the columns of interest

Alternatively, you can feed the entire data as matrix to ts() like this, excluding the first column, year

data_as_ts=ts(data[,-1], start=2001, end=2006)
data_as_ts
Time Series:
Start = 2001 
End = 2006 
Frequency = 1 
        A    B    C    D    E    F       G        H         I        J        K        L         M        N         O      P        Q       R       S    T    U    V W         X        Y        Z
2001    0    0    0    0    0    0       0      0.0      0.00      0.0      0.0      0.0      0.00      0.0       0.0      0      0.0       0       0    0    0    0 1      0.00      0.0      0.0
2002    0    0    0    0    0    0       0      0.0      0.00      0.0      0.0      0.0      0.00      0.0       0.0      0      0.0       0       0    0    0    0 1      0.00      0.0      0.0
2003    0    0    0    0    0    0       0      0.0      0.00      0.0      0.0      0.0      0.00      0.0       0.0      0      0.0       0       0    0    0    0 1      0.00      0.0      0.0
2004 2003 2004 2005 2006 2007 2008    2009   2010.0   2011.00   2012.0   2013.0   2014.0   2015.00   2016.0    2017.0   2018   2019.0    2020    2021 2022 2023 2024 4   2011.00   2012.0   2013.0
2005    0    0    0    0    0    0       0      0.0  51939.35 200485.8 340741.2 692627.4 498738.38 727855.3 1197076.0 558268 631624.2 1348869 1206862    0    0    0 3  51939.35 200485.8 340741.2
2006    0    0    0    0    0    0 2310594 949885.2 755167.32      0.0      0.0      0.0  13228.06 151441.8  108188.6      0      0.0       0       0    0    0    0 2 755167.32      0.0      0.0

This will return an object of class: "mts" "ts" "matrix", and each of the columns is a an object of class "ts". For example, class(data_as_ts[,4]) returns "ts"

Note carefully how column W was converted to numeric.

You can use apply to get the prediction for each column

apply(data_as_ts,2,function(x) predict(arima(x)))

Option 3: Split `data` into separate frames, and use `lapply()` to return a list of `ts` objects

Finally, if you want to split the frame, by column, and have a list of separate ts objects you can do this:

list_of_ts = lapply(split(melt(setDT(data)[,!c("W")], id="YEAR"), by="variable"), 
       function(x) ts(x$value, start=2001, end=2006)
)

Output (first three elements)

$A
Time Series:
Start = 2001 
End = 2006 
Frequency = 1 
[1]    0    0    0 2003    0    0

$B
Time Series:
Start = 2001 
End = 2006 
Frequency = 1 
[1]    0    0    0 2004    0    0

$C
Time Series:
Start = 2001 
End = 2006 
Frequency = 1 
[1]    0    0    0 2005    0    0

Again, you can use lapply to get the next prediction for each item of the list

lapply(list_of_ts,function(x) predict(arima(x)))

This work for ts() time series on each column but now how to apply auto.armia so I can forecast the next values for each column or vector in this example would be the next A:Z? — Kyle Overton, Feb 22 '22 at 18:23
you might want to open a new question - how to apply arima to list of time series objects? — langtang, Feb 22 '22 at 18:32
@KyleOverton, what about just using `lapply(list_of_ts,function(x) predict(arima(x)))`.. That will return the estimate for 2007...for each of the 26 time series.. — langtang, Feb 22 '22 at 18:34
Error in arima(x) : only implemented for univariate time series — Kyle Overton, Feb 22 '22 at 18:44
you only showed univariate time series (and `list_of_ts` as I define in my answer is a list of univariate time series) - that's why I don't get that error. — langtang, Feb 22 '22 at 18:49
I've added to the answer a way to use `arima` on each `ts`, and get the next (2007) value — langtang, Feb 22 '22 at 18:56
Error in optim(init[mask], armaCSS, method = optim.method, hessian = FALSE, : non-finite value supplied by optim All 3 options return this same error — Kyle Overton, Feb 22 '22 at 20:56

Time Series and Forecasting

1 Answers1

Option 1: Set as data.table and apply ts() to each column A-Z

Option 2: Just apply ts() directly to the columns of interest

Option 3: Split data into separate frames, and use lapply() to return a list of ts objects

Option 1: Set as data.table and apply `ts()` to each column A-Z

Option 2: Just apply `ts()` directly to the columns of interest

Option 3: Split `data` into separate frames, and use `lapply()` to return a list of `ts` objects