Postgresql not using index on WHERE IN but works with WHERE =

Question

I'm querying a table with a few millions rows. When I use a sub select with =

select *
from "parcels" 
where "parcel_id" = (select "parcel_id" 
                     from "parcels_properties" 
                     where "property_id" = '178528')

the index will be used and it returns in 2 secs.

This, obviously when the subquery has more than one results, I need to use WHERE IN. Then it won't use the index and gives me 10+ second.

select * 
from "parcels" 
where "parcel_id" in (select "parcel_id" 
                      from "parcels_properties" 
                      where "property_id" = '178528')

explain:

"Hash Join  (cost=82047.73..357097.98 rows=2019856 width=170)"
"  Hash Cond: ((parcels.parcel_id)::text = (parcels_properties.parcel_id)::text)"
"  ->  Seq Scan on parcels  (cost=0.00..241975.11 rows=4039711 width=170)"
"  ->  Hash  (cost=82045.23..82045.23 rows=200 width=38)"
"        ->  HashAggregate  (cost=82043.23..82045.23 rows=200 width=38)"
"              Group Key: (parcels_properties.parcel_id)::text"
"              ->  Gather  (cost=1000.00..81970.73 rows=28999 width=38)"
"                    Workers Planned: 2"
"                    ->  Parallel Seq Scan on parcels_properties  (cost=0.00..78070.83 rows=12083 width=38)"
"                          Filter: ((property_id)::text = '178528'::text)"

On line 3 of this explain, a Seq Scan on parcels table is used, instead of the parcel_id index ?

One more thing, this is on the AWS RDS.

If I run this same SQL on my local database with almost same setup (only RDS is v14, local is v12) it uses the index and returns instantly.

Created index on parcels table:

parcel_id

Created index on parcels_properties table:

parcel_id
property_id

So could anyone help me out with this issue?

Thank you.

UPDATE

Doing so will sorta force index scan on Parcels table

SET enable_seqscan = OFF;

SELECT *
FROM parcels p
WHERE EXISTS (
    SELECT 1
    FROM parcels_properties pp
    WHERE pp.parcel_id = p.parcel_id AND property_id = '178528'
);

What's your index and execute plan? could you show us more information? — D-Shih, Jan 28 '22 at 09:45
Please edit your question and restore the same column names you were originally using. Your edit has invalidated my answer. Thank you. — Tim Biegeleisen, Jan 28 '22 at 09:56
https://stackoverflow.com/a/51476889/15603477 you should read from buttom to up. so can you share the full explain code. — jian, Jan 28 '22 at 13:25

Tim Biegeleisen · Answer 1 · 2022-01-28T09:52:10.980

1

You never told us which indices you have defined, but in any case I would express your query using exists logic:

SELECT *
FROM parcels p
WHERE EXISTS (
    SELECT 1
    FROM parcels_properties pp
    WHERE pp.parcel_id = p.parcel_id AND property_id = '12345'
);

This query should benefit from the following index on parcels_properties:

CREATE INDEX idx_pp ON parcel_properties (property_id, parcel_id);

edited Jan 28 '22 at 09:52

answered Jan 28 '22 at 09:48

Tim Biegeleisen

502,043
27
286
360

I would put `property_id` first in the index since it is in the subquery predicate: `CREATE INDEX idx_pp ON parcel_properties (property_id, parcel_id);` – Radim Bača Jan 28 '22 at 09:51
@RadimBača I'll agree with your point, I have changed my answer. – Tim Biegeleisen Jan 28 '22 at 09:52
hi @TimBiegeleisen I have updated more details. – Yunwei.W Jan 28 '22 at 09:54
@TimBiegeleisen problem is, result from `FROM parcels_properties pp WHERE pp.parcel_id = p.parcel_id AND property_id = '12345'` can be multiple so I have to use `WHERE IN` – Yunwei.W Jan 28 '22 at 10:00
No, you _don't_ have to use `WHERE IN`, you can use my query instead. – Tim Biegeleisen Jan 28 '22 at 10:02
seems still doing the `-> Seq Scan on parcels p (cost=0.00..241975.11 rows=4039711 width=960)` – Yunwei.W Jan 28 '22 at 10:09
What is wrong with that? The query doesn't know which parcel to retain until it has done a lookup for each one against the properties table. A scan on parcels is acceptable here. – Tim Biegeleisen Jan 28 '22 at 10:09
but a same database structure on my local db will use a parcels.parcel_id index scan. which is 10 times faster. – Yunwei.W Jan 28 '22 at 10:14

score 0 · Answer 2 · answered Jan 28 '22 at 11:21

0

For anyone might have this same problem in the future. Try VACUUM your database.

I did a full VACUUM of the database and it worked.

 vacuumdb --echo --full --verbose --analyze -h yourdbhost -p 5432 -U your username -d yourdbname

answered Jan 28 '22 at 11:21

Yunwei.W

1,589
1
14
31

Postgresql not using index on WHERE IN but works with WHERE =

2 Answers2