I have a further complication to the problem got solved here. Rolling window function for irregular time series that can handle duplicates.
Here is data.
ticker val yr stop_date
1 A 1.0 2009 2012
2 A 2.0 2010 2020
3 A 3.0 2011 2020
4 A 4.0 2012 2020
5 A 5.0 2013 2020
6 B 2.0 2010 2020
7 B 3.0 2011 2012
8 B 4.0 2014 2020
9 B 5.0 2019 2020
10 B 6.0 2020 2020
For each ticker of the data, I want to calculate the past 5 year (yr can have duplicates but that's fine) median for each ticker-yr pair. This is a standard rolling time window median calculation with irregular gapped dates.
But on top of that the observations included in calculating the median should not be stale. The stop_date vairiable indicates when this value is stale.
Here is a specific example. Suppose we are calculating the median of past 5 year val on line 4 for ticker A. Without the staleness condition, we will have result from median(c(1,2,3,4)). But since in the current line yr is 2012, therefore val in 2009 should not be included in the calculation since it stops to be valid in 2012. So I want to have median(c(2,3,4)) for line 4.
I have tried
setDT(dt)[.(ticker = ticker, upper = yr, lower = yr-5),
on = .(ticker, yr <= upper, yr >lower),
.(med = median(val, na.rm = T)), by =.EACHI]
But this apparently does not handle the staleness problem.
So I need the median operation during the non equi join to be conditional on a comparison of the current line yr and all the stop_date in the non equi join yr range.
Can you help?