1

I have a working example of a weighted sum of transactions over the last 12 hours. Now, I've added an account column and would like to compute this weighted sum separately by group. The code will run as written below. Uncomment the line starting with # account to add the account column to df. How can I modify the second to last line of code such that it computes rollapplyr separately on each account?

library(zoo)
library(tidyverse)

Create the example data:

set.seed(123)
randomDates <- function(N, st="2017-01-01 00:00:00", et="2017-02-01 23:59:59") {
     st <- as.POSIXct(st, tz = "UTC")

     et <- as.POSIXct(et, tz = "UTC")

     dt <- as.numeric(difftime(et,st,units="sec", tz="UTC"))
     ev <- sort(runif(N, 0, dt))
     rt <- st + ev
     rt
}

df <- data.frame(date = randomDates(100) , 
                 data = round( abs(rnorm(100)) * 100 ) # ,
                 # account = sample(c("A", "B", "C"), 100, replace=TRUE  )
)

df <- df %>% arrange(date)

Define the helper functions:

tau <- 0.00005
decay = function(tau, day){
  exp(-tau * day)
}

weighted <- function(x, tau) {
  tx <- as.numeric(time(x))
  seconds <- tail(tx, 1) - tx
  w <- (seconds < 43200) * decay(tau, seconds) # 12 hours in seconds
  sum(w * coredata(x))
}

Compute the rolling sum:

# Would like to modify this block to group by account
newData <- df %>%
  read.zoo  %>% 
  rollapplyr(43200, weighted, tau = tau, partial = TRUE, coredata = FALSE)

dfNew <- df %>% mutate( weighted_sum = newData  )

                   date data weighted_sum
1   2017-01-01 00:21:26   38    38.000000
2   2017-01-01 21:29:53   56    56.000000
3   2017-01-02 14:02:43   34    34.000000
4   2017-01-02 20:41:28    9    19.279179
5   2017-01-03 06:08:07  160   161.644215

I haven't yet found the answer based on my research:

Apply a rolling sum by group in R

use rollapply and zoo to calculate rolling average of a column of variables

https://www.rdocumentation.org/packages/zoo/versions/1.8-1/topics/rollapply

I've also tried this solution based on feedback to this question and a linked, possible duplicate answer. However, applying the same pattern results in an error that I haven't been able to resolve:

newData <- df %>%
  group_by(account) %>% 
  mutate(weighted_sum =  rollapplyr(., width=43200, FUN = weighted, 
                         tau = tau, partial = TRUE, coredata = FALSE)   ) %>% 
  ungroup()

Throws this error:

# Error in mutate_impl(.data, dots) : 
  Evaluation error: non-numeric argument to binary operator.
Bobby
  • 1,585
  • 3
  • 19
  • 42
  • Am I missing something? Where is account in your example data? – CPak Apr 25 '18 at 13:46
  • @CPak I've commented it out on the 11th line of code. This ensures that the code will run as-is until it's un-commented. – Bobby Apr 25 '18 at 13:47
  • Got it. Thanks- – CPak Apr 25 '18 at 13:48
  • @ G. Grothendieck, I've studied example of of your answer to the linked duplicate question. According to this, I've tried using `group_by` before and after `read_zoo`. If I put it before, I get `Error in w * coredata(x) : non-numeric argument to binary operator`. And if I put it after, I get `Error in UseMethod("group_by_") : no applicable method for 'group_by_' applied to an object of class "zoo"` – Bobby Apr 25 '18 at 14:49
  • Also, the linked question doesn't have code for the example `df` so it's difficult to try it out. – Bobby Apr 25 '18 at 14:51
  • I've added an attempted solution based on the linked duplicate. I'd appreciate if you could take a look again. Also, this linked solution uses a regular time series and does not convert to a `zoo` object using `read.zoo` – Bobby Apr 25 '18 at 21:29
  • Create a `roll` function which inputs and outputs a data frame and then use that. Note that to use dot to mean just the rows in current group one must use do: `roll <- function(x) with(x, { z <- rollapplyr(zoo(data, date), 43200, weighted, by.column = FALSE, tau = tau, partial = TRUE, coredata = FALSE); data.frame(account, fortify.zoo(z)) }); df %>% group_by(account) %>% do(roll(.)) %>% ungroup` – G. Grothendieck Apr 26 '18 at 01:58
  • 1
    Your solution worked perfectly for me. I see that this question has been reopened. Would you like to post it as an answer so I can accept it? I think your code here looks sufficiently different from the question previously marked as a duplicate, but if you prefer to close the question again, that's not a problem. – Bobby Apr 26 '18 at 15:54

1 Answers1

0

Use cbind to create a dataframe to give as input to rollapplyr

newData <- df %>%
  group_by(account) %>% 
  mutate(weighted_sum = rollapplyr(cbind(data, date), width=43200, FUN = weighted, 
                         tau = tau, partial = TRUE, coredata = FALSE)[, 1]) %>% 
  ungroup()

newData 
# A tibble: 100 × 4
   date                 data account weighted_sum
   <dttm>              <dbl> <chr>          <dbl>
 1 2017-01-01 06:18:34   220 B              220  
 2 2017-01-01 07:14:31   131 A              131  
 3 2017-01-01 13:38:38    27 C               27  
 4 2017-01-02 01:54:10    54 C               81.0
 5 2017-01-02 02:47:01    41 C              122. 
 6 2017-01-02 18:28:18    48 B              268. 
 7 2017-01-04 00:55:51    79 A              210. 
 8 2017-01-04 05:36:46    59 B              327. 
 9 2017-01-04 07:43:24   165 C              287. 
10 2017-01-04 08:57:38     5 A              215. 
# … with 90 more rows
Julien
  • 1,613
  • 1
  • 10
  • 26