Somewhat related to this question and this one, I'm having trouble calculating a rolling sum. Unlike those questions, I would like to try to use zoo:rollsum
as analogous to the rollapply
answer here. (But if there is a more data.table
way to do it, by all means.)
Let's start with some data:
set.seed(123)
some_dates <- function(){as.Date('1980-01-01') + sort(sample.int(1e4,100))}
d <- data.table(cust_id = c(rep(123,100),rep(456,100)),
purch_dt = c(some_dates(), some_dates()),
purch_amt = round(runif(200, 1, 100),2) )
head(d)
# cust_id purch_dt purch_amt
# 1: 123 1980-01-08 24.63
# 2: 123 1980-09-03 96.27
# 3: 123 1981-02-24 60.54
I would like to do a rolling 365-day sum of purchase amount for each customer, calculated at each transaction day.
The answer here suggests the following approach:
First, create dummy rows for all customer-date pairs, using cross join, i.e. something like:
setkey(d, cust_id, purch_dt)
dummy <- d[ CJ(unique(cust_id), seq(min(purch_dt), max(purch_dt), by='day') ) ]
# cust_id purch_dt purch_amt
# 1: 123 1980-01-08 24.63
# 2: 123 1980-01-09 NA
# 3: 123 1980-01-10 NA
So far, so good (although I'm sure there's a way to tighten this dummy table to the customer-level min/max purch_dt).
My problem is how to use rollsumr
to calculate a trailing 365-day sum.
I tried:
dummy[, purch_365 := rollsumr(x=purch_amt, k=365, na.rm=TRUE) , by=cust_id]
But this creates purch_365
as all NA
s and gives two warnings like:
Warning messages:
1: In `[.data.table`(dummy, , `:=`(purch_365, rollsumr(x = purch_amt, :
Supplied 9550 items to be assigned to group 1 of size 9914 in column 'purch_365' (recycled leaving remainder of 364 items).
I get that 364 = k-1, and 2 warnings for 2 cust_id
s. Other than that I'm at a loss.
# Desired output:
# cust_id purch_dt purch_amt purch_365
# 1: 123 1980-01-08 24.63 24.63
# 2: 123 1980-09-03 96.27 120.90
# 3: 123 1981-02-24 60.54 156.81
Thanks in advance!