1

I have a data table that contains return data of a company looking like this:

enter image description here

df=structure(list(Date = structure(c(13236, 13237, 13238, 13239, 
                                      13240, 13241, 13242, 13243, 13244, 13245, 13246, 13247, 13248, 
                                      13249, 13250, 13251), class = "Date"), IBES = c("@O5G", "@O5G", 
                                                                                      "@O5G", "@O5G", "@O5G", "@O5G", "@O5G", "@O5G", "@O5G", "@O5G", 
                                                                                      "@O5G", "@O5G", "@O5G", "@O5G", "@O5G", "@O5G"), MktAdjReturn = c(-0.00381466643441897, 
                                                                                                                                                        -0.00834070809256926, -0.0193226301897589, NA, NA, -0.00885564092195712, 
                                                                                                                                                        -0.051612619547402, -0.0065292323057804, 0.042244140103735, 0.003100395243401, 
                                                                                                                                                        NA, NA, -0.00486229222347689, -0.0184708840023963, 0.00273824763632391, 
                                                                                                                                                        -0.00510010246255499)), .Names = c("Date", "IBES", "MktAdjReturn"
                                                                                                                                                        ), class = c("data.table", "data.frame"), row.names = c(NA, -16L
                                                                                                                                                        ))

I want to calculate the moving average of the following 5 days excluding the current day. Therefore, the code should skip the NA and take the next available returns. E.g. For the for the first Date 2006-03-29 the 5 days moving average should contain: -0.008340708 ; -0.019322630 ; -0.008855641 ; -0.051612620 ; -0,006529232.

I tried:

rollapply(MktAdjReturn,width = 5,FUN=mean,align = "left",fill = NA,na.rm=T), by=c("IBES")

But, it includes the current date as well and I am not sure what it exactly does with fill = NA.

Cœur
  • 37,241
  • 25
  • 195
  • 267
JB_G
  • 45
  • 7

1 Answers1

1

There are couple of change you need to do. Suppose, you need to group result by IBES then you can use dplyr.

First you should remove rows having values as NA as OP doesnt want to consider those rows in width of roll. We will join the result with main data.frame after performing calculation.

For excluding current row and considering next row you can use lead on rolling mean.

The solution should look as:

library(tidyverse)
library(zoo)


df %>% group_by(IBES) %>%
filter(!is.na(MktAdjReturn)) %>% #Remove rows with NA
mutate(rollmean = lead(rollapply(MktAdjReturn, 5, FUN=mean, na.rm=TRUE,
                                 align = "left", fill = NA))) %>%
  right_join(df, by=c("Date", "IBES")) %>% #Join back
  mutate(LastNonNaMean = max(which(!is.na(rollmean)))) %>%
  fill(rollmean) %>%  #Fill to populate rows with NA values for mean. 
  mutate(rollmean = ifelse(row_number()>LastNonNaMean, NA, rollmean)) %>%
  select(Date, IBES, MktAdjReturn = MktAdjReturn.x, rollmean)

# # A tibble: 16 x 4
# # Groups: IBES [1]
# Date       IBES  MktAdjReturn  rollmean
# <date>     <chr>        <dbl>     <dbl>
# 1 2006-03-29 @O5G     - 0.00381 - 0.0189 
# 2 2006-03-30 @O5G     - 0.00834 - 0.00882
# 3 2006-03-31 @O5G     - 0.0193  - 0.00433
# 4 2006-04-01 @O5G      NA       - 0.00433
# 5 2006-04-02 @O5G      NA       - 0.00433
# 6 2006-04-03 @O5G     - 0.00886 - 0.00353
# 7 2006-04-04 @O5G     - 0.0516    0.00310
# 8 2006-04-05 @O5G     - 0.00653   0.00495
# 9 2006-04-06 @O5G       0.0422  - 0.00452
# 10 2006-04-07 @O5G       0.00310  NA      
# 11 2006-04-08 @O5G      NA        NA      
# 12 2006-04-09 @O5G      NA        NA      
# 13 2006-04-10 @O5G     - 0.00486  NA      
# 14 2006-04-11 @O5G     - 0.0185   NA      
# 15 2006-04-12 @O5G       0.00274  NA      
# 16 2006-04-13 @O5G     - 0.00510  NA
# 
MKR
  • 19,739
  • 4
  • 23
  • 33
  • The problem here is that the code takes the next 5 values including NAs. But instead I need it to skip the NAs and to take the next available returns. E.g. For the for the first Date 2006-03-29 the 5 days moving average should contain: -0.008340708 ; -0.019322630 ; -0.008855641 ; -0.051612620 ; -0,006529232. – JB_G Jul 17 '18 at 18:38
  • @JB_G Okay. Then there can be many ways. Just tell me how will you like to calculate `mean` for `2006-04-01`? – MKR Jul 17 '18 at 18:41
  • @JB_G I have updated my answer to include a way to not consider rows containing NA values as part of width for roll. Have a look. – MKR Jul 17 '18 at 19:50
  • Glad to know it helped. – MKR Jul 17 '18 at 20:16
  • just a quick follow up question: If the first row of a group is NA, it does not calculate the rollmean because it filters NAs out at the very beginning. Do you know how to deal with that? (in the example shown above imagine the MktAdjReturn is NA for "2006-03-29") – JB_G Jul 18 '18 at 07:15