0

My data set is stored in "ts_monthly". This dataset is pretty large (1812 rows x 28 columns) and is a time series.

I wanted to compute an In-Sample OLS (using the historical average as a benchmark) but I do not know how to compute the average (avg) of the latter with an expanding window.

So far, this is the code:

#1 Historical Average

avg <- mean(window(ts_monthly, start = c(1871,01), end = c(2021,12)) [,c('log_eqpt1')], na.rm=TRUE) #HA Model

IS_error_N <- (window(ts_monthly, start = c(1871,01), end = c(2021,12)) [,c('log_eqpt1')]-avg) #HA error
#2) OLS

IS_error_A <- model$residuals

#3) IS CUMULATIVE SSE DIFFERENCE

IS <- cumsum(IS_error_N^2) - cumsum(IS_error_A^2) #this is the cumulative squared prediction errors of the Null - the cumulative squared prediction error of the Alternative. Also referred as "Cumulative SEE Difference"

The dput(head(ts_monthly,20)) gives the following:

structure(c(187101, 187102, 187103, 187104, 187105, 187106, 187107,  187108, 187109, 187110, 187111, 187112, 187201, 187202, 187203,  187204, 187205, 187206, 187207, 187208, 4.44, 4.5, 4.61, 4.74, 
4.86, 4.82, 4.73, 4.79, 4.84, 4.59, 4.64, 4.74, 4.86, 4.88, 5.04, 
5.18, 5.18, 5.13, 5.1, 5.04, 0.26, 0.26, 0.26, 0.26, 0.26, 0.26, 
0.26, 0.26, 0.26, 0.26, 0.26, 0.26, 0.2633, 0.2667, 0.27, 0.2733, 
0.2767, 0.28, 0.2833, 0.2867, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 
0.4, 0.4, 0.4, 0.4, 0.4, 0.4025, 0.405, 0.4075, 0.41, 0.4125, 
0.415, 0.4175, 0.42, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,  NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,  NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,  NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,  NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,  NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,  NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,  NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.004967, 
0.004525, 0.004252, 0.004643, 0.003698, 0.003373, 0.003255, 0.003713, 
0.004554, 0.007041, 0.006488, 0.006975, 0.005764, 0.005218, 0.006001, 
0.00603, 0.004953, 0.004067, 0.004399, NA, NA, NA, NA, NA, NA,  NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,  NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,  NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,  NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,  NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,  NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,  NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,  NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,  NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,  NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,  NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, -2.83772802441074, 
-2.85115104474288, -2.87530150497517, -2.90311078367359, -2.92811208587901, 
-2.91984757602912, -2.90099885047045, -2.91360405938943, -2.92398836869515, 
-2.87095367203906, -2.88178801420477, -2.90311078367359, -2.91549965039228, 
-2.90677606765922, -2.92673940206704, -2.94199024231951, -2.92962644846065, 
-2.90807133499557, -2.89048941187595, -2.86672498817609, NA, 
-2.83772802441074, -2.85115104474288, -2.87530150497517, -2.90311078367359, 
-2.92811208587901, -2.91984757602912, -2.90099885047045, -2.91360405938943, 
-2.92398836869515, -2.87095367203906, -2.88178801420477, -2.89049834818686, 
-2.90266928570657, -2.89447853984882, -2.91459126813139, -2.92962644846065, 
-2.91777073208428, -2.89635453132834, -2.87855944582309, -2.40694510831829, 
-2.42036812865043, -2.44451858888271, -2.47232786758114, -2.49732916978656, 
-2.48906465993666, -2.470215934378, -2.48282114329698, -2.4932054526027, 
-2.44017075594661, -2.45100509811232, -2.47232786758114, -2.49109862003592, 
-2.48901343174065, -2.5151204283845, -2.53640317555518, -2.53032412947879, 
-2.51458241793412, -2.50271127442151, -2.484906649788, -0.430782916092454, 
-0.430782916092454, -0.430782916092454, -0.430782916092454, -0.430782916092454, 
-0.430782916092454, -0.430782916092454, -0.430782916092454, -0.430782916092454, 
-0.430782916092454, -0.430782916092454, -0.430782916092454, -0.424401030356354, 
-0.417762635918571, -0.411618973682543, -0.405587066764331, -0.399302318981855, 
-0.393488917061449, -0.387778137454439, -0.38181833838809, NA,  NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,  NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,  NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,  NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,  NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), dim = c(20L,  28L), dimnames
= list(NULL, c("yyyymm", "Index", "D12", "E12",  "b.m", "tbl", "AAA", "BAA", "lty", "ntis", "Rfree", "infl", "ltr",  "corpr", "svar", "csp", "CRSP_SPvw", "CRSP_SPvwx", "log_eqp",  "eqp", "dp", "dy", "ep", "de", "tms", "dfy", "dfr", "log_eqpt1" )))

In this case "avg" compute an overall average of all data present in ts_monthly. How can I transform it in order to make it a recursive/expanding window?

Thanks!

I59
  • 1
  • 1
  • 4
    Hi @I59, welcome! Can you please provide some sample data (e.g., `dput(head(ts_monthly, 20))` and sample desired output, as well as code to create your `model` object? See [How to make a great R reproducible example](https://stackoverflow.com/q/5963269/17303805). Thanks! – zephryl Apr 06 '23 at 12:45
  • Hi @zephryl thank you! Sure sorry for that. I added in the main question the output for dput(head(ts_monthly, 20). Moreover the model object, is a linear regression, computed as follow: model <- lm(log_eqpt1 ~ dp, data = window(ts_monthly, start = c(1871,01), end = c(2021,12))) Thanks in advance! – I59 Apr 06 '23 at 17:18

1 Answers1

0

It sounds like you're trying to subtract each row from the average to date for that row? If so, you can use dplyr::cummean(). Note column log_eqpt1 is all NA in your sample data, so I used column dp instead.

library(dplyr)

# coerce sample data to time-series object
ts_monthly <- ts(ts_monthly, frequency = 12, start = c(1871, 1), end = c(1872, 8))

cum_avg_dp <- cummean(ts_monthly[, "dp"])

cum_avg_error <- ts_monthly[, "dp"] - cum_avg_dp
cum_avg_error
#>               Jan          Feb          Mar          Apr          May
#> 1871  0.000000000 -0.006711510 -0.020574647 -0.036287944 -0.049031397
#> 1872 -0.021253926 -0.011635319 -0.029492076 -0.041946484 -0.027842532
#>               Jun          Jul          Aug          Sep          Oct
#> 1871 -0.033972406 -0.012963155 -0.022372318 -0.029117002  0.021525925
#> 1872 -0.005938118  0.011030974  0.033055628                          
#>               Nov          Dec
#> 1871  0.009719621 -0.010636220
#> 1872

If you don't want the current value included in the average, use dplyr::lag():

cum_avg_error <- ts_monthly[, "dp"] - dplyr::lag(cum_avg_dp)
cum_avg_error
#>               Jan          Feb          Mar          Apr          May
#> 1871           NA -0.013423020 -0.030861970 -0.048383926 -0.061289246
#> 1872 -0.023025086 -0.012530343 -0.031598653 -0.044742916 -0.029582690
#>               Jun          Jul          Aug          Sep          Oct
#> 1871 -0.040766887 -0.015123681 -0.025568364 -0.032756627  0.023917694
#> 1872 -0.006287419  0.011643806  0.034795397                          
#>               Nov          Dec
#> 1871  0.010691583 -0.011603149
#> 1872

Created on 2023-04-06 with reprex v2.0.2

zephryl
  • 14,633
  • 3
  • 11
  • 30