0

I have a Postgres table writing every minute int value (requests count).
I have some requests types on some servers, all of them on the same table:

time                |  key1    | key2      | key3   | value     
-----------------------------------------------------------------------
2017-01-16 18:00:53 | server1  | webpage1  | type1  | 30
2017-01-16 18:00:55 | server1  | webpage2  | type1  | 31
2017-01-16 18:00:58 | server1  | webpage3  | type1  | 32
2017-01-16 18:00:59 | server1  | webpage4  | type1  | 33
2017-01-16 18:01:00 | server1  | webpage5  | type1  | 34
2017-01-16 18:01:01 | server1  | webpage6  | type1  | 35
2017-01-16 18:01:02 | server1  | webpage7  | type1  | 36
2017-01-16 18:01:03 | server1  | webpage8  | type1  | 37
2017-01-16 18:01:04 | server1  | webpage1  | type1  | 56
2017-01-16 18:01:06 | server1  | webpage2  | type1  | 35
2017-01-16 18:01:07 | server1  | webpage3  | type1  | 43
2017-01-16 18:01:10 | server1  | webpage4  | type1  | 64
2017-01-16 18:01:13 | server1  | webpage5  | type1  | 44
2017-01-16 18:01:14 | server1  | webpage6  | type1  | 66
2017-01-16 18:01:16 | server1  | webpage7  | type1  | 56
2017-01-16 18:01:18 | server1  | webpage8  | type1  | 22

Assume key1 and key3 has different values as well (for the sake of this example I emitted some data).

I need the result to be the diff of the latest value on group (key1, key2, key3) minus the 1 offset of the latest [I need the rate per minute].

I successfully got the result of the latest and 1 offset of it in the same table (grouped by key):

SELECT * FROM 
(SELECT ROW_NUMBER() 
        OVER(PARTITION BY key1, key2, key3 ORDER BY time DESC) as rnum,
 time, key1, key2, key3, value FROM test ORDER BY time DESC) a
WHERE rnum < 3;

So the result is:

 rnum |        time         |  key1   |   key2   | key3  | value
------+---------------------+---------+----------+-------+-------
    1 | 2017-01-16 18:01:18 | server1 | webpage8 | type1 |    22
    1 | 2017-01-16 18:01:16 | server1 | webpage7 | type1 |    56
    1 | 2017-01-16 18:01:14 | server1 | webpage6 | type1 |    66
    1 | 2017-01-16 18:01:13 | server1 | webpage5 | type1 |    44
    1 | 2017-01-16 18:01:10 | server1 | webpage4 | type1 |    64
    1 | 2017-01-16 18:01:07 | server1 | webpage3 | type1 |    43
    1 | 2017-01-16 18:01:06 | server1 | webpage2 | type1 |    35
    1 | 2017-01-16 18:01:04 | server1 | webpage1 | type1 |    56
    2 | 2017-01-16 18:01:03 | server1 | webpage8 | type1 |    37
    2 | 2017-01-16 18:01:02 | server1 | webpage7 | type1 |    36
    2 | 2017-01-16 18:01:01 | server1 | webpage6 | type1 |    35
    2 | 2017-01-16 18:01:00 | server1 | webpage5 | type1 |    34
    2 | 2017-01-16 18:00:59 | server1 | webpage4 | type1 |    33
    2 | 2017-01-16 18:00:58 | server1 | webpage3 | type1 |    32
    2 | 2017-01-16 18:00:55 | server1 | webpage2 | type1 |    31
    2 | 2017-01-16 18:00:53 | server1 | webpage1 | type1 |    30

Now, I thought i can take the value column of MIN(time) and MAX(time) and calc the diff, but I cannot "merge" the lines.

After @HartCO Comment I was able to do this:

select time, new_val-last_val, key1, key2, key3 from
  (select distinct max(time)  over(partition by key1, key2, key3) as time,
          max(value) over(partition by key1, key2, key3) as new_val,
          min(value) over(partition by key1, key2, key3) as last_val,
          key1, key2, key3
   from (select row_number() over(partition by key1, key2, key3 order by time desc) as rnum, 
                time, key1, key2, key3, value from test order by time desc) a 
where rnum < 3) b;

And I got:

    time         | ?column? |  key1   |   key2   | key3
---------------------+----------+---------+----------+-------
2017-01-16 18:01:14 |       31 | server1 | webpage6 | type1
2017-01-16 18:01:18 |       15 | server1 | webpage8 | type1
2017-01-16 18:01:16 |       20 | server1 | webpage7 | type1
2017-01-16 18:01:04 |       26 | server1 | webpage1 | type1
2017-01-16 18:01:13 |       10 | server1 | webpage5 | type1
2017-01-16 18:01:06 |        4 | server1 | webpage2 | type1
2017-01-16 18:01:07 |       11 | server1 | webpage3 | type1
2017-01-16 18:01:10 |       31 | server1 | webpage4 | type1

But the desired output should be -15 on webpage8, not 22.

gabi
  • 1,003
  • 5
  • 12
  • 30
  • 1
    You seem close, just as you have used `ROW_NUMBER() OVER()` you can use `MIN(time) OVER(...) and same for `MAX()` to get those values relative to group of rows defined by the `PARTITION BY` clause (no `ORDER BY` used for those two. – Hart CO Jan 16 '17 at 20:02
  • @HartCO - can you please give as example for MAX and MIN with OVER? – gabi Jan 16 '17 at 20:22
  • @HartCO - See my edit. But it is not good enough. if the counter decrease I will not catch it, I am doing max(val) and min(val) and it is not correlated to the max(time) and min(time) – gabi Jan 16 '17 at 20:34
  • Could you take your sample data and show what the desired output is to help clarify? – Hart CO Jan 16 '17 at 20:49
  • @HartCO - Yes , I will edit my sample data with that specific case and add the desired output – gabi Jan 16 '17 at 20:50
  • 1
    See http://stackoverflow.com/questions/24691462/postgresql-calculate-difference-between-rows. This'll be a lot easier using [`lag()`](https://www.postgresql.org/docs/current/static/functions-window.html). – Ilja Everilä Jan 16 '17 at 21:24
  • Re-reading your updated question, I agree with @IljaEverilä `LAG()` and `LEAD()` can be used to return values from other rows, and you can use those in calculations. – Hart CO Jan 16 '17 at 21:30

1 Answers1

1

These kinds of differences between rows offset by some amount are best handled with the lag() and lead() window functions. For getting the latest value you can use DISTINCT ON combined with ORDER BY, if your tables aren't huge. Note that DISTINCT ON is a Postgresql extension.

SELECT DISTINCT ON (key1, key2, key3)
       time,
       key1,
       key2,
       key3,
       value - lag(value) OVER (PARTITION BY key1, key2, key3 ORDER BY time) 
FROM test
ORDER BY key1, key2, key3, time DESC;

This gives us

        time         |    key1    |    key2     |   key3   | ?column? 
---------------------+------------+-------------+----------+----------
 2017-01-16 18:01:04 |  server1   |  webpage1   |  type1   |       26
 2017-01-16 18:01:06 |  server1   |  webpage2   |  type1   |        4
 2017-01-16 18:01:07 |  server1   |  webpage3   |  type1   |       11
 2017-01-16 18:01:10 |  server1   |  webpage4   |  type1   |       31
 2017-01-16 18:01:13 |  server1   |  webpage5   |  type1   |       10
 2017-01-16 18:01:14 |  server1   |  webpage6   |  type1   |       31
 2017-01-16 18:01:16 |  server1   |  webpage7   |  type1   |       20
 2017-01-16 18:01:18 |  server1   |  webpage8   |  type1   |      -15
(8 rows)

Of course you can use the other well known solutions, for example a left join.

WITH diffs AS (
    SELECT time,
           key1,
           key2,
           key3,
           value - lag(value) OVER (PARTITION BY key1, key2, key3 ORDER BY time)
    FROM test)
SELECT d1.*
FROM diffs d1
LEFT JOIN diffs d2
  ON (d1.key1, d1.key2, d1.key3) = (d2.key1, d2.key2, d2.key3)
 -- This allows us to single out the greatest row
 AND d1.time < d2.time
WHERE d2.time IS NULL
-- Ordering is just for show
ORDER BY d1.key1, d1.key2, d1.key3;

Using Postgresql 9.5 the planner recognized this pattern and used an anti join as the final query plan. You could also get similar results using NOT EXISTS.

Community
  • 1
  • 1
Ilja Everilä
  • 50,538
  • 7
  • 126
  • 127