I've got a history table of updates to records, and I want to calculate cumulative totals where values may be added or deleted to the set. (ie the cumulative total for one month may be less than the previous).
For example, here's a table with the history of updates to tags for a person record. (id is the id of the person record).
I want to count how many people had the "established" tag in any given month, accounting for when it was added or removed in a prior month.
+----+------------------------+---------------------+
| id | tags | created_at |
+----+------------------------+---------------------+
| 1 | ["vip", "established"] | 2017-01-01 00:00:00 |
| 2 | ["established"] | 2017-01-01 00:00:00 |
| 3 | ["established"] | 2017-02-01 00:00:00 |
| 1 | ["vip"] | 2017-03-01 00:00:00 |
| 4 | ["established"] | 2017-05-01 00:00:00 |
+----+------------------------+---------------------+
With some help from these posts, I've gotten this far:
SELECT
item_month,
sum(count(distinct(id))) OVER (ORDER BY item_month)
FROM (
SELECT
to_char("created_at", 'yyyy-mm') as item_month,
id
FROM person_history
WHERE tags ? 'established'
) t1
GROUP BY item_month;
Which gives me:
month count
2017-01 2
2017-02 3
2017-05 4 <--- should be 3
And it's also missing an entry for 2017-03 which should be 2.
(An entry for 2017-04 would be nice too, but the UI could always infer it from the previous month if need be)