1

I have a very simple query that executed surprisingly slow and the reasons turned out that it was doing a scan on the view after doing a JOIN. I was a bit surprised to see that, as I was expecting Postgres to do the filtering before joining, seeing that one of the underlying tables had an INDEX on the column that was to be filtered.

Is there a way I can reorder the query somehow or hint to the planner how to do the query differently?

Mind you, I do know how to hack around the issue by directly accessing the underlying tables, but the view hides a bit of complexity it would be nice to keep out of the queries.

The query

select * from form where encounter_id= 23728 and type = 'vitals';

The EXPLAIN ANALYZE

Subquery Scan on form  (cost=0.57..3439.07 rows=1 width=622) (actual time=8.187..8.187 rows=0 loops=1)
  Filter: ((form.encounter_id = 23728) AND (form.type = 'vitals'::text))
  Rows Removed by Filter: 12000
  ->  Unique  (cost=0.57..3259.07 rows=12000 width=626) (actual time=0.008..7.612 rows=12000 loops=1)
        ->  Merge Join  (cost=0.57..3229.07 rows=12000 width=626) (actual time=0.007..5.485 rows=12000 loops=1)
              Merge Cond: (fd.form_id = f.id)
              ->  Index Scan using _idx_form_details on _form_details fd  (cost=0.29..2636.78 rows=12000 width=603) (actual time=0.003..1.918 rows=12000 loops=1)
              ->  Index Scan using pk_form on _form f  (cost=0.29..412.29 rows=12000 width=27) (actual time=0.002..1.214 rows=12000 loops=1)
Planning time: 0.170 ms
Execution time: 8.212 ms

The TABLE and VIEW definitions

CREATE TABLE _form (
  id INT NOT NULL,
  encounter_id INT REFERENCES _encounter (id)            NOT NULL,
  type         TEXT                                      NOT NULL,
  CONSTRAINT pk_form PRIMARY KEY (id),
  FOREIGN KEY (cid) REFERENCES _user_in_role (id)
);

CREATE INDEX encounter_id ON _form (encounter_id, type);

CREATE TABLE _form_details (
  id INT NOT NULL,
  form_id   INT REFERENCES _form (id) NOT NULL,
  archived  BOOLEAN                   NOT NULL DEFAULT FALSE,
  CONSTRAINT pk_form_details PRIMARY KEY (id),
  FOREIGN KEY (cid) REFERENCES _user_in_role (id)
);

CREATE VIEW form AS
  SELECT DISTINCT ON (f.id)
    f.id,
    f.encounter_id,
    f.type,
    fd.archived,
    f.cid
  FROM _form f 
       JOIN _form_details fd 
         ON (f.id = fd.form_id)
  ORDER BY f.id, fd.id DESC;

EDIT: Someone posted an answer (subsequently deleted) which contained an important tidbit of information: even though the encounter_id column in the underlying table is indexed, the ORDER BY operation in the VIEW defeats its purpose. Unfortunately, we cannot rid ourselves of ORDER BY as it is required for the DISTINCT ON to work.

oligofren
  • 20,744
  • 16
  • 93
  • 180
  • 3
    8 milliseconds doesn't really strike me as "slow" - how fast do you need that to be? –  Feb 14 '19 at 12:50
  • Similar questions https://stackoverflow.com/questions/27142333/index-is-ignored-on-joined-view-in-postgres, https://stackoverflow.com/questions/18134798/why-does-my-view-in-postgresql-not-use-the-index, – oligofren Feb 14 '19 at 12:57
  • @a_horse_with_no_name Everything is relative. This is O(N) performance, meaning it scales linearly with the number of forms. This is just with a test data set I created with approx 300 forms for a certain `encounter_id` (out of 12000). I expect there to be a significant multiple of these numbers, as well as this query running several hundred times per second, when in production. _Mind your nanoseconds_ (look it up). – oligofren Feb 14 '19 at 13:01
  • @a_horse_with_no_name Well, it's already too slow at 8ms. We are using version 10.6. – oligofren Feb 14 '19 at 13:08
  • Did you try an index on `(type, encounter_id)`? Do you always query for the same `type` value? Then a filtered index using `where type = 'vitals'` might help. If you have just a few distinct values for `type` creating one filtered index for each type might help as well. What is your value for `work_mem` does it help if you increase that? –  Feb 14 '19 at 13:11
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/188425/discussion-between-oligofren-and-a-horse-with-no-name). – oligofren Feb 14 '19 at 13:52
  • @a_horse_with_no_name The solution you posted in the chat using the lateral join was clearly the fastest. Want to add your answer and I'll award you the points. – oligofren Feb 19 '19 at 11:22

2 Answers2

2
  • The DISTINCT ON ... ORDER BY is a performance killer(the subquery cannot be broken up)
  • there is an index missing for the form_id INT REFERENCES _form (id) FK
  • aNOT EXISTS() antijoin, or row_number() can be used to avoid the DISTINCT subquery

SET search_path=tmp;
/***/
\i tmp.sql

CREATE TABLE tform (
  id INT NOT NULL
  , encounter_id INT NOT NULL -- REFERENCES tencounter (id)
  , ztype         TEXT                                      NOT NULL
  , CONSTRAINT pk_form PRIMARY KEY (id)
  -- FOREIGN KEY (cid) REFERENCES _user_in_role (id)
);

CREATE TABLE tform_details (
  id INT NOT NULL
  , form_id   INT REFERENCES tform (id) NOT NULL
  , archived  BOOLEAN                   NOT NULL DEFAULT FALSE
  , CONSTRAINT pk_form_details PRIMARY KEY (id)
  -- , FOREIGN KEY (cid) REFERENCES _user_in_role (id)
);

-- ALTER TABLE tform ADD FOREIGN KEY(encounter_id) REFERENCES tencounter (id) ;
CREATE INDEX encounter_id ON tform (encounter_id, ztype);

INSERT INTO tform (id, encounter_id, ztype)
SELECT gs, 23720+gs%29, 'ztype_' || gs::text
FROM generate_series(1,10000) gs
        ;

INSERT INTO tform_details (id, form_id, archived)
SELECT 10000*gs+tf.id, tf.id, (random() > 0.3) ::boolean
FROM tform tf
CROSS JOIN generate_series(0,22) gs
        ;


UPDATE tform
SET ztype = 'vitals'
WHERE random() < 0.2;

/***/
DROP INDEX xxxx ;
CREATE UNIQUE INDEX xxxx ON tform_details (form_id, id);

VACUUM ANALYZE tform;
VACUUM ANALYZE tform_details;

\d tform;
\d tform_details;

select COUNT(*) FROM tform;
select COUNT(*) FROM tform_details;

DROP VIEW form ;
CREATE VIEW form AS
  SELECT DISTINCT ON (f.id)
    f.id
    , f.encounter_id
    , f.ztype
    , fd.archived
    -- , f.cid
  FROM tform f
       JOIN tform_details fd ON f.id = fd.form_id
  ORDER BY f.id, fd.id DESC
        ;

DROP VIEW form2 ;
CREATE VIEW form2 AS
  SELECT f.id
    , f.encounter_id
    , f.ztype
    , fd.archived
  FROM tform f
  JOIN tform_details fd
    ON f.id = fd.form_id
  WHERE NOT EXISTS ( SELECT *
        FROM tform_details nx
        WHERE nx.form_id = fd.form_id
        AND nx.id > fd.id
        )
        ;

DROP VIEW form3 ;
CREATE VIEW form3 AS
  SELECT f.id
    , f.encounter_id
    , f.ztype
    , fd.archived
  FROM tform f
  JOIN ( select xx.form_id, xx.archived
        , row_number() OVER (PARTITION BY xx.form_id ORDER BY xx.id DESC) AS rn
         FROM tform_details xx
        ) fd ON f.id = fd.form_id AND fd.rn = 1
        ;

\echo burn-in
EXPLAIN ANALYZE
select * from form where encounter_id= 23728 and ztype = 'vitals' ;

\echo plain
EXPLAIN ANALYZE
select * from form where encounter_id= 23728 and ztype = 'vitals' ;

EXPLAIN ANALYZE
select * from form2 where encounter_id= 23728 and ztype = 'vitals' ;

EXPLAIN ANALYZE
select * from form3 where encounter_id= 23728 and ztype = 'vitals' ;

\echo no_hash
SET enable_hashjoin = False;

EXPLAIN ANALYZE
select * from form where encounter_id= 23728 and ztype = 'vitals' ;

EXPLAIN ANALYZE
select * from form2 where encounter_id= 23728 and ztype = 'vitals' ;

EXPLAIN ANALYZE
select * from form3 where encounter_id= 23728 and ztype = 'vitals' ;
wildplasser
  • 43,142
  • 8
  • 66
  • 109
1

@a_horse_with_no_name gave me the fastest solution so far, in the chat, but never supplied an answer. So for reference, here's his solution, using a lateral join to create the view.

CREATE VIEW form AS
    SELECT f.id, 
        f.encounter_id, 
        f.type, 
        fd.archived, 
        f.cid 
    FROM _form f 
    JOIN LATERAL ( 
        SELECT form_id, archived 
        FROM _form_details _fd 
        WHERE _fd.form_id = f.id 
        ORDER BY _fd.id DESC 
        LIMIT 1 
    ) AS fd ON TRUE; 

This is about 10x faster than any other solutions. If created as form4, based on the same tables as the ones from @wildplasser, here's how it performs:

select * from form where encounter_id= 23728 and ztype = 'vitals' ;
Time: 181.065 ms
select * from form2 where encounter_id= 23728 and ztype = 'vitals' ;
Time: 12.395 ms
select * from form3 where encounter_id= 23728 and ztype = 'vitals' ;
Time: 122.305 ms
select * from form4 where encounter_id= 23728 and ztype = 'vitals' ;
Time: 1.305 ms

Some good pointers on lateral joins, introduced in Postgres 9.3:

oligofren
  • 20,744
  • 16
  • 93
  • 180