0

I've come across a few questions that are somewhat related to mine (e.g. 1, 2). None of them really answer the question though.

Basically my problem is that Postgres refuses to use an index in certain seemingly simple queries. I have managed to boil down my complex query to some examples below:

-- SETUP

CREATE TEMP TABLE test ( 
    id BIGSERIAL PRIMARY KEY,
    DATA INT 
);

INSERT INTO test(DATA)
SELECT RANDOM() FROM GENERATE_SERIES(1, 100000);

-- just in case the stats are outdated
VACUUM ANALYSE test;

-- QUERIES

-- correct but does a full table scan
EXPLAIN SELECT
    id,
    DATA
FROM
    test
WHERE
    (FALSE
    OR id IN (
    SELECT
        UNNEST(ARRAY[100]) 
    ));



-- simpler than above but still does a table scan
EXPLAIN SELECT
    id,
    DATA
FROM
    test
WHERE
    FALSE
    OR id IN (SELECT 100);

Output of EXPLAIN:

QUERY PLAN                                                
----------------------------------------------------------
Seq Scan on test  (cost=0.01..1791.01 rows=50000 width=12)
  Filter: (hashed SubPlan 1)                              
  SubPlan 1                                               
    ->  Result  (cost=0.00..0.01 rows=1 width=4)          
-- uses PRIMARY KEY index correctly but limited to 1 ID
-- NOTE that it uses "=" instead of "IN"
EXPLAIN SELECT
    id,
    DATA
FROM
    test
WHERE
    FALSE
    OR id = (SELECT 100 LIMIT 1);
QUERY PLAN                                                           
---------------------------------------------------------------------
Index Scan using test_pkey on test  (cost=0.30..8.32 rows=1 width=12)
  Index Cond: (id = $0)                                              
  InitPlan 1 (returns $0)                                            
    ->  Limit  (cost=0.00..0.01 rows=1 width=4)                      
          ->  Result  (cost=0.00..0.01 rows=1 width=4)               

uses PRIMARY KEY index correctly but I require a LEFT JOIN not an INNER JOIN. -- See next example.

EXPLAIN
WITH ids AS (
SELECT
    UNNEST(ARRAY[100]) AS ids )
SELECT
    id,
    DATA
FROM
    test
INNER JOIN ids ON
    id = ids.ids;
QUERY PLAN                                                                 
---------------------------------------------------------------------------
Nested Loop  (cost=0.29..8.34 rows=1 width=12)                             
  ->  ProjectSet  (cost=0.00..0.02 rows=1 width=4)                         
        ->  Result  (cost=0.00..0.01 rows=1 width=0)                       
  ->  Index Scan using test_pkey on test  (cost=0.29..8.31 rows=1 width=12)
        Index Cond: (id = (unnest('{100}'::integer[])))                    

I need a left join but this does a full table scan

EXPLAIN WITH ids AS (
  SELECT 100 AS id 
)
SELECT
    test.id,
    DATA
FROM
    test
LEFT JOIN ids ON
    test.id = ids.id;
QUERY PLAN                                                 
-----------------------------------------------------------
Seq Scan on test  (cost=0.00..1541.00 rows=100000 width=12)

So my questions:

  1. Why does FALSE OR ... IN ... prevent the use of an index, while FALSE OR ... = ... is fine?
  2. Why does the INNER JOIN use an index but LEFT JOIN does not in my example?
  3. Is there some way to rewrite the first query without using UNION or duplicating the whole query? (My actual query is much more complex)
geniass
  • 371
  • 1
  • 9
  • 20
  • 2
    You'll have to rewrite the query using `UNION`. [`OR` is a performance killer.](https://www.cybertec-postgresql.com/en/avoid-or-for-better-performance/) – Laurenz Albe Nov 20 '19 at 09:08
  • 2
    The left join does not limit the rows returned from the table `test` so all rows from that table are needed and a Seq Scan is the most efficient way to do that. –  Nov 20 '19 at 09:39
  • @LaurenzAlbe that link you posted was very helpful. It mentions a more efficient way to rewrite the OR IN part using ANY – geniass Nov 20 '19 at 10:32

1 Answers1

0
  1. Move the specified select near the table like :

    SELECT
      id, DATA
    FROM
      test, (SELECT UNNEST(ARRAY[100]) as id1) as sel
    WHERE
      (FALSE OR id = sel.id1);
    
  2. Broke or clauses using UNION ALL like:

    SELECT
      id, DATA
    FROM
      test
    WHERE
      FALSE -- first or predicate
    UNION ALL
    SELECT
      id, DATA
    FROM
      test
    WHERE id in (select UNNEST(ARRAY[100]);
    
  • `WHERE id in (select UNNEST(ARRAY[100]);` is better written as `where id = any(array[...])` –  Nov 20 '19 at 11:48
  • @a_horse_with_no_name yes that's what I found in the link that LaurenzAlbe posted – geniass Nov 20 '19 at 13:11
  • Solution 1 seems to be equivalent to an INNER JOIN I think, if the array is empty or null. – geniass Nov 20 '19 at 13:12
  • @a_horse_with_no, that construct is work with original select ...(.... where (false or id = any(array[...]) ... but I think that select was write as sample for another select with multiple rows ..... :) – Radu Melian Nov 25 '19 at 21:17