3

I have a table called 'ts' in postgres that looks like this:

dev -- numeric device id
ts -- Unix epoch timestamp
key -- string (only interested in records where key is 'u')
val -- int representing uptime in ms

I am logging process uptime to this table, and I do so every 30 seconds or so. The result is a table that has an ever-increasing val, which, periodically, resets to 0 when the process restarts, creating a sawtooth like pattern. Note that the actual value logged may not be 0, as logging may not happen immediately upon restart.

What I'd like to do is clean up the table a bit by periodically culling the values that do not represent the pre-reset max uptime. There are multiple devices mixed into the same table, so that max uptime would have to be grouped by dev.

Example:

dev ts      key  val
1   100000  'u'  50      -- boring
1   130100  'u'  30050   -- delete this
1   160100  'u'  60050   -- >> keep this one
1   190200  'u'  100     -- this record dies
1   220200  'u'  30100   -- >> keep this one too
1   250200  'u'  300    

I want a query to pick out all the records except those I've marked above, which are uninteresting and can be deleted.

The culling will be run periodically in a batch process.

S-Man
  • 22,521
  • 7
  • 40
  • 63
Watusimoto
  • 1,773
  • 1
  • 23
  • 38

3 Answers3

2

If you just want to keep local maxima, you can use lead() and lag():

select t.*
from (select t.*,
             lead(val) over (partition by dev order by ts) as next_val,
             lag(val) over (partition by dev order by ts) as prev_val
      from t
      where key = 'u'
     ) t
where val > prev_val and val > next_val;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I didn't know about lead() and lag(). Those are pretty cool. – Watusimoto Sep 13 '18 at 14:20
  • 1
    Pay attention: If you have 2 time the same (max) value then this would not work. Maybe 50 30050 *60050 60050* 100 ... – S-Man Sep 13 '18 at 15:29
  • 1
    I'm glad you pointed this out, though it is not relevant to my particular situation, as I am using timestamps that are ever-increasing. The only way I could get duplicate values is if the process restarts twice shortly after logging, in time to be caught twice with the same (short) uptime. If this happens, I don't want the duplicate record to be deleted. But your point could be very important to someone with a similar problem but different details in their situation. – Watusimoto Sep 17 '18 at 05:06
1

Instead, why not update the relevant record itself? And only insert a new record when the process restarts.

user3097732
  • 159
  • 1
  • 2
1

Because it's fun: Used the new PostgreSQL 11 feature "GROUPS in window functions" to solve the problem with tied local maxima.

The problem:

dev key ts      val
1   u   100000  50
1   u   130100  30050
1   u   160100  60050 -- really single local maximum
1   u   190200  100
1   u   220200  30100 -- local maximum together with next value
1   u   250200  30100 
1   u   300000  300
1   u   500000  100
1   u   550000  1000  -- a (tied) local maximum if only 1 before and 1 after is used, which is wrong
1   u   600000  1000
1   u   650000  2000  -- real local maximum together with 2 next rows
1   u   700000  2000
1   u   720000  2000
1   u   750000  300

The new PostgreSQL 11 feature:

JOOQ Blog Post explains the feature

The Postgres 11 documentation

demo: db<>fiddle

SELECT 
    dev, key, ts, val 
FROM (
     SELECT
        *, 
        -- B:
        max(val) over (order by sum, val GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING) as local_max 
     FROM (
        SELECT -- A
            *, 
            sum(is_diff) over (order by ts) 
        FROM (
            SELECT 
                *,
                CASE WHEN val = lag(val) over (order by ts) THEN 0 ELSE 1 END as is_diff
            FROM test t
            WHERE key = 'u'
        )s
    )s
)s 
WHERE val = local_max

A: This is just a preparation part. Window functions need a certain order. If you would do a PARTITION BY val the table would be ordered by val first. But at this example you would like to hold the order by ts. And then you want to the window function magic for val. So in this part I am calculating a group number for same values in direct following rows by holding the order by ts. (Maybe this could be done in a better way?)

The result is this:

dev key ts      val     is_diff  sum
1   u   100000  50      1        1
1   u   130100  30050   1        2
1   u   160100  60050   1        3
1   u   190200  100     1        4
1   u   220200  30100   1        5     \ same group
1   u   250200  30100   0        5     /
1   u   300000  300     1        6
1   u   500000  100     1        7
1   u   550000  1000    1        8     \ same group
1   u   600000  1000    0        8     /
1   u   650000  2000    1        9     \
1   u   700000  2000    0        9     | same group
1   u   720000  2000    0        9     /
1   u   750000  300     1        10

B: This is the new Postgres 11 function. It is possible now to check values for groups. In the past it was possible to look for a fixed row number. But now you can check for the value of the next group. Speaking: If you have 3 three rows with the same value, you could check the next or previous value that is not tied no matter how many rows you tie. This solved the problem in a very cool way:

For the example with the two 1000 values: Now we can check: Is the next value greater than the current one? No, it's the same. So it's the same group. So let's have a look at the following row. That's 2000 and it's greater. So the current row cannot be a local maximum.

With this group window you can get the maximum value of the surrounded groups which gives you the local value even if there are tied values.

S-Man
  • 22,521
  • 7
  • 40
  • 63