SQL: Indexing/grouping events with dual clear condition

Question

(PostgreSQL 9.3) I have a table "events" with millions of complex events, stored as received by a device. For example purposes:

+-----------+-------+
| Timestamp | Event |
+-----------+-------+
| 1         | A     |
| 2         | A     |
| 2         | B     |
| 3         | B     |
| 10        | A     |
| 11        | A     |
| 11        | 0     |
| 11        | C     |
| 12        | A     |
+-----------+-------+

In this case I have four different kinds of events: A, B, C and 0. What I want to do is index them such that I can have start/stop timestamps for each event. The stop conditions are: Event is no longer being reported at a given timestamp OR a "0" even came in, clearing all of them. Final output:

+------+----+-------+
| From | To | Event |
+------+----+-------+
| 1    | 3  | A     |
| 2    | 10 | B     |
| 10   | 11 | A     |
| 11   | 11 | C     |
| 12   |    | A     |
+------+----+-------+

I this case, A was raised at 1, and cleared at 3 because it was no longer being reported at that moment. B was raised at 2, and cleared at 10 for similar reason. A was raised again at 10 and cleared at 11 with the 0 event (despite being reported at that time too!). C was raised at 11 AND cleared at the same time (some ordering will need to be done to handle 0 at same timestamp). Lastly, A was raised again at 12 and is currently active so it gets a NULL end timestamp.

I do have something that works but it is CTE-heavy and as such, doesn't scale well for millions of records. I have been experimenting with LATERAL (with great results) and I am open to any 9.3-specific recommendations. Also the "event" itself has greatly been simplified for this question, in fact it is a complex group of columns. It's possible Window-functions could apply here too.

Your first example entry looks wrong, it should be "to 3" not "to 2". It's inconsistent with all the others. — Craig Ringer, Mar 29 '14 at 02:32
No time to write a proper answer right now, but you're right to look at window functions. Test `event = lag(event) OVER (ORDER BY timestamp)`; this lets you detect *edges* where changes happen. There are quite a lot of similar questions here on Stack Overflow btw, where people want to produce a resultset with the edges where contiguous series of values change; finding them might be somewhat tricky. — Craig Ringer, Mar 29 '14 at 02:34
Actually I don't see "to 2" in the first indexed event? But thanks for looking at my request - I was hoping either you or Erwin could push me in the right direction. [Here's a similar question](http://stackoverflow.com/questions/12480818/jump-sql-gap-over-specific-condition-proper-lead-usage) but somehow I can't figure out how to apply the same logic here. — Jeff, Mar 30 '14 at 02:59

score 0 · Accepted Answer · answered Apr 01 '14 at 21:57

Thinking out of the box here, why do you not maintain the summary table with a trigger?

here is an example for your case (omitted FKs etc.)

create table event_type (
    event_type_id serial,
    event_name varchar(255)
);

create table event (
    event_time timestamp(0),
    event_type_id int
);

create table event_summary (
    event_summary_id serial,
    sum_from timestamp(0),
    sum_to timestamp(0),
    event_type_id int
);

create language plpgsql;

create or replace function event_insertion() returns trigger as $$
    declare
        var_event_summary_id integer;
    begin
        -- find out if event was fired during the previous second
        select
            event_summary_id
        into
            var_event_summary_id
        from
            event_summary s
        where
            new.event_type_id = s.event_type_id
            and sum_to >= new.event_time - interval '1 seconds';

        if found then
            --update existing summary to include this timestamp
            update event_summary set sum_to = new.event_time where event_summary_id = var_event_summary_id;
        else
            --create new summary for just this timestamp
            insert into event_summary(sum_from,sum_to,event_type_id) values (new.event_time,new.event_time,new.event_type_id);
        end if;

    return null;
    end;
$$ language plpgsql;

create trigger event_insertion after insert on event
    for each row execute procedure event_insertion();

-- some initial data
insert into event_type(event_name) values ('a');
insert into event_type(event_name) values ('b');
insert into event_type(event_name) values ('c');
insert into event_type(event_name) values ('0');

-- fire the events
insert into event(event_time,event_type_id) values (now(),(select event_type_id from event_type where event_name = 'a'));
select pg_sleep(1);
insert into event(event_time,event_type_id) values (now(),(select event_type_id from event_type where event_name = 'a'));
insert into event(event_time,event_type_id) values (now(),(select event_type_id from event_type where event_name = 'b'));
select pg_sleep(1);
insert into event(event_time,event_type_id) values (now(),(select event_type_id from event_type where event_name = 'b'));
select pg_sleep(7);
insert into event(event_time,event_type_id) values (now(),(select event_type_id from event_type where event_name = 'a'));
select pg_sleep(1);
insert into event(event_time,event_type_id) values (now(),(select event_type_id from event_type where event_name = 'a'));
insert into event(event_time,event_type_id) values (now(),(select event_type_id from event_type where event_name = '0'));
insert into event(event_time,event_type_id) values (now(),(select event_type_id from event_type where event_name = 'c'));
select pg_sleep(1);
insert into event(event_time,event_type_id) values (now(),(select event_type_id from event_type where event_name = 'a'));

-- query the summary table
select extract (seconds from s.sum_from), extract (seconds from s.sum_to), t.event_name from event_summary s inner join event_type t on (t.event_type_id = s.event_type_id);

SQL: Indexing/grouping events with dual clear condition

1 Answers1