1

Suppose I have the following table where there are two sets of observation dates (2015-01-01, 2016-01-01) in the first column. For each observation date, there are associated item_date and item_value.

observation_date item_date item_value
2015-01-01 2012-12-31 0
2015-01-01 2013-03-31 1
2015-01-01 2013-06-30 2
2015-01-01 2013-09-30 3
2015-01-01 2013-12-31 4
2015-01-01 2014-03-31 5
2015-01-01 2014-06-30 6
2015-01-01 2014-09-30 7
2016-01-01 2013-09-30 8
2016-01-01 2013-12-31 9
2016-01-01 2014-03-31 10
2016-01-01 2014-06-30 11
2016-01-01 2014-09-30 12
2016-01-01 2014-12-31 13
2016-01-01 2015-03-31 14
2016-01-01 2015-06-30 15
2016-01-01 2015-09-30 16

If I were to group by observation_date and get max(item_date), I would get the following result.

observation_date item_date item_value
2015-01-01 2014-09-30 7
2016-01-01 2015-09-30 16

Now instead of the max aggregate function, I'd like to create my own function. My goal is as follows: For each observation date, I would like to return the row where the item_date is n years before the max(item_date). For example, if n=1, I would get the rows where the item_date is 1 year ago from the max(item_date).

observation_date item_date item_value
2015-01-01 2013-09-30 3
2016-01-01 2014-09-30 12

Note that I simply do not want just the dates, but the whole row. I've been looking at sqlite's create_aggregate functionality but I don't know how to return the full row. Any ideas how I can do this?

If there are more efficient ways to achieve this, please let me know.

forpas
  • 160,666
  • 10
  • 38
  • 76
kyc12
  • 349
  • 2
  • 15
  • Are you certain that there is always a row with a date equal to max date - 1 year for each observation_date? – forpas Jul 07 '21 at 08:20
  • If you are using `n=1` would an `item_date` 11 months before not show? And if I understand correctly, you want only 1 record, correct? – Rui Costa Jul 07 '21 at 08:39
  • @RuiCosta Why would `n=1` show a date 11 months ago instead of 12? And yes, I want only 1 record – kyc12 Jul 07 '21 at 16:52
  • @forpas Yes, if there are enough rows preceding the date, there will always be date equal to max date - 1 year and so on. `item_date` is a sequence of continuous quarterly end dates – kyc12 Jul 07 '21 at 16:57

1 Answers1

2

You need a correlated subquery that returns for each observation_date the max date minus ? years:

sql = """
SELECT t1.*
FROM tablename t1
WHERE t1.item_date = (
  SELECT DATE(MAX(t2.item_date), '-' || ? || ' year') 
  FROM tablename t2 
  WHERE t2.observation_date = t1.observation_date
);
"""
cursor = conn.cursor()
cursor.execute(sql, ("1",))

See the demo.

forpas
  • 160,666
  • 10
  • 38
  • 76