2

I've greatly simplified the examples to hopefully produce a clear enough question that can be answered:

Consider a table of events

CREATE TABLE alertable_events
(
  unique_id text NOT NULL DEFAULT ''::text,
  generated_on timestamp without time zone NOT NULL DEFAULT now(),
  message_text text NOT NULL DEFAULT ''::text,
  CONSTRAINT pk_alertable_events PRIMARY KEY (unique_id),
)

with the following data:

COPY alertable_events (unique_id,message_text,generated_on) FROM stdin;
one message one 2014-03-20 06:00:00.000000
two message two 2014-03-21 06:00:00.000000
three   message three   2014-03-22 06:00:00.000000
four    message four    2014-03-23 06:00:00.000000
five    message five    2014-03-24 06:00:00.000000
\.

And for each event, there is a list of fields

CREATE TABLE alertable_event_fields
(
  unique_id text NOT NULL DEFAULT ''::text,
  field_name text NOT NULL,
  field_value text NOT NULL DEFAULT ''::text,
  CONSTRAINT pk_alertable_event_fields PRIMARY KEY (unique_id, field_name),
  CONSTRAINT fk_alertable_event_fields_0 FOREIGN KEY (unique_id)
      REFERENCES alertable_events (unique_id) MATCH SIMPLE
      ON UPDATE CASCADE ON DELETE CASCADE,
)

with the following data:

COPY alertable_event_fields (unique_id,field_name,field_value) FROM stdin;
one field1  a
one field2  b
two field1  z
two field2  y
three   field1  a
three   field2  m
four    field1  a
four    field2  b
five    field1  z
five    field2  y
\.

I want to define a view that produces the following:

| unique_id | fields | message_text  | generated_on               | updated_on                 | count |
| five      | z|y    | message five  | 2014-03-21 06:00:00.000000 | 2014-03-24 06:00:00.000000 | 2     |
| four      | a|b    | message four  | 2014-03-20 06:00:00.000000 | 2014-03-23 06:00:00.000000 | 2     |
| three     | a|m    | message three | 2014-03-22 06:00:00.000000 | 2014-03-22 06:00:00.000000 | 1     |

Notably:

  1. fields is a pipe delimited string (or any serialization of) the field values (json encoding of field_name:field_value pairs would be even better ... but I can work with pipe_delim for now)
  2. the output is grouped by matching fields. Update 3/30 12:45am The values are ordered by their field_name's alphabetically therefore a|b would not match b|a
  3. a count is produced of the events that match that field set. updated 3/30 12:45am there can be different number of fields per unique_id, a match requires matching all fields and not a subset of the fields.
  4. generated_on is the timestamp of the first event
  5. updated_on is the timestamp of the most recent event
  6. message_text is the message_text of the most recent event

I've produced this view, and it works for small data sets, however, as the alertable_events table grows, it becomes exceptionally slow. I can only assume I'm doing something wrong in the view because I have never dealt with anything quite so ugly.

Update 3/30 12:15PM EDT It looks like I may have server tuning problems causing this high run-times, see added explain for more info. If you see a glaring issue there, I'd be greatly interested in tweaking the server's configuration.

Can anyone piece together a view that handles large datasets well and has a significantly better run time than this? Perhaps using hstore? (I'm running 9.2 preferrably, though 9.3 if I can have a nice json encoding of the fields.)

Updated 3/30 11:30AM I'm beginning to think my issue may be server tuning (which means I'll need to talk to the SA) Here's a very simple explain (analyze,buffers) which is showing a ridiculous run-time for as few as 8k rows in the unduplicated_event_fields

Update 3/30 7:20PM I bumped my available memory to 5MB using SET WORK_MEM='5MB' (which is plenty for the query below), strangely, even though the planner went to in memory quicksort, it actually took on average 100ms longer!

explain (analyze,buffers) 
SELECT a.unique_id,
       array_to_string(array_agg(a.field_value order by a.field_name),'|') AS "values"
FROM alertable_event_fields a
GROUP BY a.unique_id;
                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=771.11..892.79 rows=4056 width=80) (actual time=588.679..630.989 rows=4056 loops=1)
   Buffers: shared hit=143, temp read=90 written=90
   ->  Sort  (cost=771.11..791.39 rows=8112 width=80) (actual time=588.591..592.622 rows=8112 loops=1)
         Sort Key: unique_id
         Sort Method: external merge  Disk: 712kB
         Buffers: shared hit=143, temp read=90 written=90
         ->  Seq Scan on alertable_event_fields a  (cost=0.00..244.40 rows=8112 width=80) (actual time=0.018..5.478 rows=8112 loops=1)
               Filter: (message_name = 'LIMIT_STATUS'::text)
               Buffers: shared hit=143
 Total runtime: 632.323 ms
(10 rows)

Update 3/30 4:10AM EDT I'm still not completely satisfied and would be interested in any further optimization. I have a requirement to support 500msgs/sec steady state, and although most of those should not be "events", I get a little backlogged right now when stress testing.

Update 3/30 12:00PM EDT Here's my most readable iteration yet, unfortunately, for 4000 rows I'm still looking at 600ms runtimes! ... (see above, as its mostly contained with the inner most query) Any help here would be greatly appreciated

CREATE OR REPLACE VIEW views.unduplicated_events AS 
 SELECT a.unique_id,a.message_text,
        b."values",b.generated_on,b.updated_on,b.count
 FROM alertable_events a
 JOIN (
       SELECT b."values", 
              min(a.generated_on) AS generated_on,
              max(a.generated_on) AS updated_on,
              count(*) AS count
       FROM alertable_events a
       JOIN ( 
             SELECT a.unique_id,
                    array_to_string(array_agg(a.field_value order by a.field_name),'|') AS "values"
             FROM alertable_event_fields a
             GROUP BY a.unique_id
            ) b USING (unique_id)
       GROUP BY b."values"
 ) b ON a.generated_on=b.updated_on
 ORDER BY updated_on DESC;

Update 3/30 12:00PM EDT removed old stuff as this is getting too long

Reed Debaets
  • 482
  • 1
  • 6
  • 18
  • I've gotten around needing to select on views.unduplicated_events by created a table unduplicated_events that stores only the values ... its not pleasant, but it at least means no backlog on inserts, and now the view is only access when loading a website to view the data ... at 2000 rows its still hitting around 1.5s though so I'd be very interested in optimization of this view ... I'm looking to reuse unduplicated_events table to help some at this point. – Reed Debaets Mar 30 '14 at 00:50
  • 1
    `the output is grouped by matching fields` Please define exactly what is considered a match. Are there always exactly two rows per `unique_id`? Is `(a|b)` considered to match `(b|a)`? – Erwin Brandstetter Mar 30 '14 at 02:53
  • I was hoping to attract your attention Erwin, whether you know it or not, your answers on StackOverflow have been helping me for the past few months! When I string_agg the fields I use an `ORDER BY a.unique_id, upper(a.field_name)` so that the fields are always displayed in alphabetical order. Therefore `b|a` would not match `a|b` as they would be referring to different field names. There can be any number of fields, and a match requires matching all fields (not a subset). If you think there is too much ambiguity still, I can switch to `field1=a|field2=b` format – Reed Debaets Mar 30 '14 at 04:41
  • Found http://stackoverflow.com/questions/15261087/aggregate-functions-on-multiple-joined-tables ... I think this has merit to my question. Will see if I can take it as a stepping stone. – Reed Debaets Mar 30 '14 at 14:24
  • Your query does not match the table definition. (`message_text` <-> `message_name`). Typo? – Erwin Brandstetter Mar 30 '14 at 23:10
  • crap, thought I had sanitized it out. message_name is another column in the actual app ... apparently failed to delete it out in latest iteration. (alertable_event_fields actually has a compound key of message_name,field_name ... but for this example its unimportant and have been removing it) – Reed Debaets Mar 30 '14 at 23:16

1 Answers1

1

Some pointers

Invalid query

Your current query is incorrect unless generated_on is unique, which is not declared in the question and probably is not the case:

CREATE OR REPLACE VIEW views.unduplicated_events AS 
SELECT ...
FROM alertable_events a
JOIN (   ...  ) b ON a.generated_on=b.updated_on  -- !! unreliable

Possibly faster

SELECT DISTINCT ON (f.fields) 
       unique_id                   -- most recent
     , f.fields
     , e.message_text              -- most recent
     , min(e.generated_on) OVER (PARTITION BY f.fields) AS generated_on -- "first"
     , e.generated_on                                   AS updated_on   -- most recent
     , count(*)            OVER (PARTITION BY f.fields) AS ct
FROM   alertable_events e
JOIN  (
   SELECT unique_id, array_to_string(array_agg(field_value), '|') AS fields
   FROM  (
      SELECT unique_id, field_value
      FROM   alertable_event_fields
      ORDER  BY 1, field_name   -- a bit of a hack, but much faster
      ) f
   GROUP  BY 1
   ) f USING (unique_id)
ORDER  BY f.fields, e.generated_on DESC;

SQL Fiddle.

The result is currently sorted by fields. If you need a different sort order, you'd need to wrap it in another subquery ...

Major points

  • The output column name generated_on conflicts with the input column generated_on. You have to table-qualify the column e.generated_on to refer to the input column. I added table-qualification everywhere to make it clear, but it is only actually necessary the ORDER BY clause. The manual:

    If an ORDER BY expression is a simple name that matches both an output column name and an input column name, ORDER BY will interpret it as the output column name. This is the opposite of the choice that GROUP BY will make in the same situation. This inconsistency is made to be compatible with the SQL standard.

    The updated query should also be faster (as intended all along). Run EXPLAIN ANALYZE again.

  • For the whole query, indexes will hardly be of use. Only if you select specific rows ... One possible exception: a covering index for alertable_event_fields:

      CREATE INDEX f_idx1
      ON alertable_event_fields (unique_id, field_name, field_value);
    

    Lots of write operations might void the benefit, though.

  • array_agg(field_value ORDER BY ...) tends to be slower for big sets than pre-sorting in a subquery.

  • DISTINCT ON is convenient here. Not sure, whether it's actually faster, though, since ct and generated_on have to be computed in separate window functions, which requires another sort step.

  • work_mem: setting it too high can actually harm performance. More in the Postgres Wiki. or in "Craig's list".

Generally this is hard to optimize. Indexes fail because the sort order depends on two tables. If you can work with a snapshot, consider a MATERIALIZED VIEW.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Can you run this with your data to verify it's actually faster? – Erwin Brandstetter Mar 30 '14 at 23:34
  • just tested it ... return on 4500rows is ~800ms for this query (its right at that for the current view definition as well) – Reed Debaets Mar 31 '14 at 00:03
  • I would prefer a different sort order (updated_on DESC) ... though that isn't specified as part of the original problem – Reed Debaets Mar 31 '14 at 00:05
  • When I discovered that the basic agg function was taking up most of the time I figured "hard to optimize" would be the answer. Unfortunately, as you expect, I'm constantly writing to and viewing this table. I'll just develop around the slow query. Accepted for the links that give more info and for catching the invalid query, and again I appreciate all the help in other questions around SO... – Reed Debaets Mar 31 '14 at 01:49
  • Should I expect my performance to degrade linearly? If I have 40k messages, would my 800ms become 8000ms? – Reed Debaets Mar 31 '14 at 02:04
  • Maybe this plays into your comments regarding "not alot of writes" ... but I noticed at a point when I had 9k rows or so in the archive, I ran the view and it return in <100ms ... repeatedly. I checked the explain and it looks like it is doing an index scan during the agg instead of a seq scan ... is this something the planner will do automatically at some point? Can I force index scan without affecting inserts? ... If I'm too late to catch you here, I'll post a separate question. – Reed Debaets Mar 31 '14 at 05:00
  • I think there is an issue with your query, Erwin, because when I change `ORDER BY fields, generated_on DESC;` to `ORDER BY fields, generated_on ASC;` I get the same unique_id ... – Reed Debaets Apr 01 '14 at 16:19
  • 1
    @ReedDebaets: Ah, right. Sorry, a tricky one. Consider the updated answer and this working [fiddle with ascending sort order](http://sqlfiddle.com/#!15/18dc9/8) – Erwin Brandstetter Apr 01 '14 at 18:20