0

Has anyone encountered calculating historical mean log returns in time series datasets?

The dataset is ordered by individual security first and by time for each respective security. I am trying to form a historical mean log return, i.e. the mean log return for the security from its first appearance in the dataset to date, for each point in time for each security.

Luckily, the return time series contains NAs between returns for differing securities. My idea is to calculate a historical mean that restarts after each NA that appears.

A simple cumsum() probably will not do it, as the NAs will have to be dropped.

I thought about using rollmean(), if I only knew an efficient way to specify the 'width' parameter to the length of the vector of consecutive preceding non-NAs. The current approach I am taking, based on Count how many consecutive values are true, takes significantly too much time, given the size of the data set I am working with. For any x of the form x : [r(1) r(2) ... r(N)], where r(2) is the log return in period 2:

df <- data.frame(x, zcount = NA) 
df[1,2] = 0 #df$x[1]=NA by construction of the data set
for(i in 2:nrow(df)) 
df$zcount[i] <- ifelse(!is.na(df$x[i]), df$zcount[i-1]+1, 0)

Any idea how to speed this up would be highly appreciated!

Jon E.
  • 11
  • 3
  • 1
    Please review [mcve]. – G. Grothendieck Aug 02 '17 at 16:29
  • See [this example](https://stackoverflow.com/a/42617491/4421870) using `rollapply` to exclude NAs – Mako212 Aug 02 '17 at 17:14
  • Thank you, @Mako212. I have looked at rollapply as well, but as I am understanding it, it won't get done what I am looking for. I may misinterpret it though. As mentioned above, I am not trying to exclude NAs from the calculation, but to reset the 'calculation window' as soon as an NA occurs. rollapply() requires to specify the parameter 'width', which is why I came up with the approach to create an additional variable that counts the consecutive non-NA values, 'zcount'. The 'width' parameter in rollapply() is then set to 'zcount'. – Jon E. Aug 02 '17 at 17:36
  • It sounds like you have a single column of prices. Do you also have a single column of security names and another column of date/time stamps? – Justin Aug 02 '17 at 18:09
  • @Justin I have columns of log returns with security identification numbers, prices, time, ordered by security first and then time. – Jon E. Aug 02 '17 at 18:14
  • Do you know how many securities you have and how many time observations for each security? – Justin Aug 02 '17 at 18:16
  • I know the number of unique securities. I don't know the precise number of time observations for each security. They range between one and 1092. Where are you trying to lead me to? – Jon E. Aug 02 '17 at 19:03

1 Answers1

0

You will need to reshape the data.frame to apply the cumsum function over each security. Here's how:

First, I'll generate some data on 100 securities over 100 months which I think corresponds to your description of the data set

securities <- 100
months <- 100
time <- seq.Date(as.Date("2010/1/1"), by = "months", length.out = months)
ID <- rep(paste0("sec", 1:months), each = securities)
returns <- rnorm(securities * months, mean = 0.08, sd = 2)

df <- data.frame(time, ID, returns)
head(df)
        time   ID    returns
1 2010-01-01 sec1 -3.0114466
2 2010-02-01 sec1 -1.7566112
3 2010-03-01 sec1  1.6615731
4 2010-04-01 sec1  0.9692533
5 2010-05-01 sec1  1.3075774
6 2010-06-01 sec1  0.6323768

Now, you must reshape your data so that each security column contains its returns, and each row represents the date.

library(tidyr)
df_wide <- spread(df, ID, returns)

Once this is done, you can use the apply function to sum every column which now represents each security. Or use the cumsum function. Notice the data object df_wide[-1], which drops the time column. This is necessary to avoid the sum or cumsum functions throwing an error.

matrix_sum <- apply(df_wide[-1], 2, FUN = sum)

matrix_cumsum <- apply(df_wide[-1], 2, FUN = cumsum)

Now, add the time column back as a data.frame if you like:

df_final <- data.frame(time = df_wide[,1], matrix_cumsum)
Justin
  • 1,360
  • 12
  • 15
  • Thank you, @Justin. I really appreciate your reply which looks great for total cumulated returns per security. I need, however, time-specific cumulated returns for each security for my analysis, which running *cumsum()* over entire securities will not yield, if I am not mistaken. How would you make it time-specific? For the time being, I took the time to take the above approach, but it'd be helpful to be more efficient next time. – Jon E. Aug 04 '17 at 16:32
  • @JonE. I listed both methods above. The `cumsum` function *will* yield a running total of the cumulative sum at each date. In the example above, where the results are assigned to `matrix_cumsum`, the cumulative sum *is* displayed at each date. Cut and paste my code into your R session and then `View(df_final)` to see the running totals for each security at each date. – Justin Aug 04 '17 at 18:13