1

I've got a history table of updates to records, and I want to calculate cumulative totals where values may be added or deleted to the set. (ie the cumulative total for one month may be less than the previous).

For example, here's a table with the history of updates to tags for a person record. (id is the id of the person record).

I want to count how many people had the "established" tag in any given month, accounting for when it was added or removed in a prior month.

+----+------------------------+---------------------+
| id |          tags          |     created_at      |
+----+------------------------+---------------------+
|  1 | ["vip", "established"] | 2017-01-01 00:00:00 |
|  2 | ["established"]        | 2017-01-01 00:00:00 |
|  3 | ["established"]        | 2017-02-01 00:00:00 |
|  1 | ["vip"]                | 2017-03-01 00:00:00 |
|  4 | ["established"]        | 2017-05-01 00:00:00 |
+----+------------------------+---------------------+

With some help from these posts, I've gotten this far:

SELECT 
  item_month,
  sum(count(distinct(id))) OVER (ORDER BY item_month)
FROM (
  SELECT 
    to_char("created_at", 'yyyy-mm') as item_month,
    id
  FROM person_history 
  WHERE tags ? 'established'
) t1
GROUP BY item_month;

Which gives me:

month   count
2017-01 2
2017-02 3
2017-05 4 <--- should be 3

And it's also missing an entry for 2017-03 which should be 2.

(An entry for 2017-04 would be nice too, but the UI could always infer it from the previous month if need be)

ChrisJ
  • 2,486
  • 21
  • 40
  • Unrelated, but: `distinct` is **NOT** a function. `distinct(id)` is the same as `distinct id` –  May 04 '18 at 07:35
  • 1
    The `sum() over (..)` calculates a running total. So the value for May includes the values for January and February. So the value 4 is correct. What do you think that should be 3? –  May 04 '18 at 07:42

1 Answers1

2

Here is step-by-step tutorial, you could try to collapse all those CTEs:

with 
  -- Example data
  person_history(id, tags, created_at) as (values
    (1, '["vip", "est"]'::jsonb, '2017-01-01'::timestamp),
    (2, '["est"]', '2017-01-01'), -- Note that Person 2 changed its tags several times per month 
    (2, '["vip"]', '2017-01-02'),
    (2, '["vip", "est"]', '2017-01-03'),
    (3, '["est"]', '2017-02-01'),
    (1, '["vip"]', '2017-03-01'),
    (4, '["est"]', '2017-05-01')),
  -- Get the last tags for each person per month
  monthly as (
    select distinct on (id, date_trunc('month', created_at))
      id,
      date_trunc('month', created_at) as month,
      tags,
      created_at
    from person_history
    order by 1, 2, created_at desc),
  -- Retrieve tags from previous month
  monthly_prev as (
    select
      *,
      coalesce((lag(tags) over (partition by id order by month)), '[]') as prev_tags
    from monthly),
  -- Calculate delta: if "est" was added then 1, removed then -1, nothing heppens then 0
  monthly_delta as (
    select
      *,
      case
        when tags ? 'est' and not prev_tags ? 'est' then 1
        when not tags ? 'est' and prev_tags ? 'est' then -1
        else 0
      end as delta
    from monthly_prev),
  -- Sum all deltas for each month
  monthly_total as (
    select month, sum(delta) as total
    from monthly_delta
    group by month)
-- Finally calculate cumulative sum
select *, sum(total) over (order by month) from monthly_total
order by month;

Result:

┌─────────────────────┬───────┬─────┐
│        month        │ total │ sum │
├─────────────────────┼───────┼─────┤
│ 2017-01-01 00:00:00 │     2 │   2 │
│ 2017-02-01 00:00:00 │     1 │   3 │
│ 2017-03-01 00:00:00 │    -1 │   2 │
│ 2017-05-01 00:00:00 │     1 │   3 │
└─────────────────────┴───────┴─────┘
Abelisto
  • 14,826
  • 2
  • 33
  • 41
  • Just curious and pardon my ignorance: In `postgres`, does the `?` work similar to `~` ?... or is it a user-defined operator ? – Kaushik Nayak May 04 '18 at 18:34
  • @KaushikNayak You are welcome :) `?` is built-in JSONB operator: [link](https://www.postgresql.org/docs/current/static/functions-json.html#FUNCTIONS-JSONB-OP-TABLE). Unfortunately in the doc it is not mentioned that it could be used not only for object keys but also for array elements. – Abelisto May 04 '18 at 19:54