I have a problem where adding a simple test condition (which should be evaluated only once per query) results in the whole query taking ten times as long to return. I'm running PostgreSQL 9.2, and the tables in play are as follows:
CREATE TABLE tags (
tid int4 UNIQUE NOT NULL,
name text UNIQUE NOT NULL,
PRIMARY KEY (tid));
CREATE TABLE bugs (
bid int4 UNIQUE NOT NULL,
type int2 NOT NULL,
title text NOT NULL,
PRIMARY KEY (bid));
CREATE TABLE bug_tags (
bid int4 REFERENCES bugs(bid) NOT NULL,
tid int4 REFERENCES tags(tid) NOT NULL,
PRIMARY KEY (bid, tid));
CREATE INDEX bug_tags_bid_idx ON bug_tags (bid);
CREATE INDEX bug_tags_tid_idx ON bug_tags (tid);
The query must return the first 20 matching bugs ordered by bid, where the matching condition is that the bug's type must conform to a supplied bitmask (variable $TYPE), and the set of tags associated with the bug must be a superset of a supplied set of tagnames (denoted by $TAGNAMES in the prepared statement). After trying several approaches, below is the one that produces the best results. The basic idea is to first fetch all bugs satisfying the $TAGNAMES condition, and then test each one for the $TYPE condition. Moreover, if $TAGNAMES is empty, then we skip fetching those bugs and instead concentrate on just looping through the whole table (since we only want the first 20 rows, this is supposedly much faster). This latter test is the one that is causing the puzzling results.
WITH tids AS
(SELECT tid
FROM tags
WHERE name = ANY ($TAGNAMES :: text[])),
bugs_from_tags AS
(SELECT DISTINCT t1.bid
FROM bug_tags AS t1
WHERE NOT EXISTS
(SELECT *
FROM tids
WHERE NOT EXISTS
(SELECT *
FROM bug_tags AS t2
WHERE t2.tid = tids.tid AND t2.bid = t1.bid)))
SELECT bid, type, title
FROM bugs
WHERE (type & $TYPE <> 0) AND
(($TAGNAMES :: text[]) = '{}' OR bid IN (SELECT * FROM bugs_from_tags))
ORDER BY bid
LIMIT 20;
The version without the test is of course identical except for the WHERE clause, which looks like this instead:
WHERE (type & $TYPE <> 0) AND (bid IN (SELECT * FROM bugs_from_tags))
Below are the approximate ANALYZE results for a test database (with random data) consisting of 1,000,000 bugs, 2,000 tags, and 200,000 rows in bug_tags. The columns "with" and "without" refer to the query versions with/without the test against empty. The numbers are in milliseconds.
with without
len($TAGNAMES) = 0: 4 1500
len($TAGNAMES) = 1: 13000 1600
Obviously, adding the test condition pays off handsomely when len($TAGNAMES) = 0, but results in disaster otherwise. This is puzzling because the condition is independent of each row, and therefore should only be evaluated once. Moreover, the results of EXPLAIN ANALYZE (which are quite long) do show that the plans with/without used in the case when len($TAGNAMES) = 1 are very different, though they should be identical!
Anyway, I suspect I stumbled upon a quirk (or bug?) of PostgreSQL's query planner. Any ideas on how I can get around it?
NOTE: Results of EXPLAIN ANALYZE for both tests are here and here.