1

I have a further complication to the problem got solved here. Rolling window function for irregular time series that can handle duplicates.

Here is data.

   ticker  val   yr           stop_date
1    A    1.0  2009              2012
2    A    2.0  2010              2020
3    A    3.0  2011              2020
4    A    4.0  2012              2020
5    A    5.0  2013              2020
6    B    2.0  2010              2020 
7    B    3.0  2011              2012
8    B    4.0  2014              2020
9    B    5.0  2019              2020
10   B    6.0  2020              2020

For each ticker of the data, I want to calculate the past 5 year (yr can have duplicates but that's fine) median for each ticker-yr pair. This is a standard rolling time window median calculation with irregular gapped dates.

But on top of that the observations included in calculating the median should not be stale. The stop_date vairiable indicates when this value is stale.

Here is a specific example. Suppose we are calculating the median of past 5 year val on line 4 for ticker A. Without the staleness condition, we will have result from median(c(1,2,3,4)). But since in the current line yr is 2012, therefore val in 2009 should not be included in the calculation since it stops to be valid in 2012. So I want to have median(c(2,3,4)) for line 4.

I have tried

setDT(dt)[.(ticker = ticker, upper = yr, lower = yr-5),
                            on = .(ticker, yr <= upper, yr >lower),
                          .(med = median(val, na.rm = T)), by =.EACHI]

But this apparently does not handle the staleness problem.

So I need the median operation during the non equi join to be conditional on a comparison of the current line yr and all the stop_date in the non equi join yr range.

Can you help?

  • 2
    "rolling" + "conditional" + "median" + "non-equi join" ... sounds scary! – jangorecki Jun 21 '20 at 19:07
  • What does staleness mean? It looks like if you change `yr-5` to `stop_date -5` it won't use the stale data. – at80 Jun 21 '20 at 19:20
  • staleness just means I don't want to use that observation in my calculation of median. And it depends on whether the current line yr is or has passed the stop_date of the past 5 observations. In my example, observation 1 has become stale in 2012. so it will not be considered when calculating the past 5 year median from 2012 onwards. – user9439811 Jun 21 '20 at 20:07
  • U r argonaut? Maybe just add yr>stop_date in the on argument – chinsoon12 Jun 21 '20 at 22:51
  • @user9439811 if comment of chinsoon12 resolved your problem please self answer your question – jangorecki Jun 28 '20 at 23:34
  • No, it dosen't. I ended up writing a loop but still want a data.table solution. – user9439811 Jul 05 '20 at 13:37

0 Answers0