0

Given a dataframe containing a timeseries with irrgularly spaced intervals, defined as:

df <- data.frame(date = as.Date("2016-01-01") + ((1:100) + sample(1:5, 100, replace = TRUE)), 
data = rnorm(100) )

How can I calculate a rolling sum of the data column over the previous 30 days, with weights defined by this decay function?

tau <- 0.05
decay = function(tau, day){
  exp(-tau * day)
}

The current day's data then has a weight of 1 and the data from 30 days ago has a weight of decay(0.05, 30) = 0.2231302. Missing days from the input time series should still be accounted for in computing the weights using the decay function.

If possible, I would like to convert the data frame to a zoo or xts object and then use the rollapplyr function or similar, and to do this with dplyr pipes.

Bobby
  • 1,585
  • 3
  • 19
  • 42

2 Answers2

2

Define a function weighted that takes the last 30 points and from those only keeps the points within 30 days of the last one. Then using those it multiplies that by the weights.

In the pipeline we convert df to zoo and then use rollapplyr with weighted. Note that it is important that we use coredata = FALSE so that the time index is passed to weighted. Without that it would not be.

library(dplyr)
library(zoo)

weighted <- function(x, tau) {
  tx <- time(x)
  cx <- coredata(x)[tx > tail(tx, 1) - 30] # only keep if within 30 days
  w <- decay(tau, seq(to = 0, by = -1, length = length(cx)) )
  sum(w * cx)
}

df %>%
  read.zoo %>%
  rollapplyr(30, weighted, tau = tau, partial = TRUE, coredata = FALSE)

If you want to treat missing days as 0 then use this instead:

weighted <- function(x, tau) {
  tx <- as.numeric(time(x))
  days <- tail(tx, 1) - tx
  w <- (days < 30) * decay(tau, days)
  sum(w * coredata(x))
}

Note

We have used the following input modified from the question by adding set.seed for reproducibility. Also the code used in the question might by chance give rise to multiple values with the same date and we eliminated such duplicates.

set.seed(123)
df <- data.frame(date = as.Date("2016-01-01") + 1:100 + sample(1:5, 100, replace = TRUE), 
  data = rnorm(100) )
df <- df[!duplicated(df$date), ]

tau <- 0.05
decay = function(tau, day){
  exp(-tau * day)
}
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • This looks great, but I'm not sure that it's giving the correct result. I've used the same seed and I see these dates `2016-01-04 2016-01-07 2016-01-08 2016-01-10 2016-01-11 2016-01-13` with these values: `0.2533185 0.2124173 1.7185282 3.0033168 2.6310724 2.6266077`. The first value looks fine, but the second should be `-0.02854676 + 0.25331851 * decay(0.05,3)`. This is because Jan 4 is 3 days before Jan 7. It gives the calculated number in case I use `decay(0.05,1)` there. And I'm not sure how the 3rd value is calculated. I will prepare my actual data w/o duplicate dates, thanks! – Bobby Apr 24 '18 at 15:47
  • Thanks for the quick reply. I wrote in the question that the current day has a weight of 1 and previous days are weighted according to the decay function that I specified: The current day's data then has a weight of 1 and the data from 30 days ago has a weight of decay(0.05, 30) = 0.2231302. Could you let me know how I should modify the question so that it's more clear? – Bobby Apr 24 '18 at 16:19
  • Thanks I’ll test that soon. You’re really helping me to learn all of the possibilities for window functions, which is great. Any general tips while studying? – Bobby Apr 24 '18 at 18:20
  • By the way, I've found your quick reference, this looks helpful. https://cran.r-project.org/web/packages/zoo/vignettes/zoo-quickref.pdf And I'm going through this course on Datacamp: https://campus.datacamp.com/courses/manipulating-time-series-data-in-r-with-xts-zoo – Bobby Apr 24 '18 at 19:37
  • I think there was a bug in the code in my comment. I have fixed it hopefully and transferred it to the answer. – G. Grothendieck Apr 24 '18 at 21:53
  • The answer is correct according to my expectations of the output and this has helped me tremendously. I've updated the question based on your feedback. If you feel that the question is still not precise enough, could you also make any necessary corrections there? – Bobby Apr 25 '18 at 08:20
  • I've modified the code to use `POSIXct` objects rather than just `dates` and have now extended it to apply the window over seconds rather than days. I'm running into some trouble when adding separate accounts and therefore I've posted another question here: https://stackoverflow.com/questions/50023898/compute-a-rolling-weighted-sum-by-group – Bobby Apr 25 '18 at 13:44
0

I am not sure about pipes, but this should get you going:

d <- decay(tau, 29:0)
rollapply(df, 30, function(z) {
  data <- as.data.frame(z, stringsAsFactors = FALSE)
  data$data <- as.numeric(data$data)
  sum(data$data * d)
}, by.column = FALSE)
Bulat
  • 6,869
  • 1
  • 29
  • 52
  • The function looks nice but doesn't give a result the same length as the original input. It has length 37 but the input has length 66. – Bobby Apr 24 '18 at 16:03