Problem:
I have a query that joins three tables. Whenever I run this query after a while (say 24hrs), it would take a lot of time to execute. But from that time onwards, it would execute really fast (~ 70x faster). I wanted to know what's the problem that it takes so long to execute for the first time, and how to solve it.
Table conditions:
The tables are: property_2
, property_attribute_2
, and property_address_2
. Each of which is a partition of a bigger table (i.e. property
, property_attribute
, and property_address
). Also, rows in property_attribute_2
and property_address_2
have reference key to property_2
using column property_id
. These columns (property.id
, property_attribute_2.property_id
, and property_address_2.property_id
) are all indexed.
The query is:
select * from public.property_2 a
inner join public.property_attribute_2 b on a.id = b.property_id
left join public.property_address_2 c on a.id=c.property_id
The query plan when I run it after a while is:
Hash Right Join (cost=670010.33..983391.75 rows=2477776 width=185) (actual time=804159.499..1065892.338 rows=2477924 loops=1)
Hash Cond: (c.property_id = a.id)
-> Seq Scan on property_address_2 c (cost=0.00..131660.48 rows=4257948 width=72) (actual time=289.781..247906.955 rows=4257973 loops=1)
-> Hash (cost=595483.13..595483.13 rows=2477776 width=117) (actual time=803833.183..803833.185 rows=2477921 loops=1)
Buckets: 32768 Batches: 128 Memory Usage: 3165kB
-> Hash Join (cost=94193.96..595483.13 rows=2477776 width=117) (actual time=98061.326..802753.642 rows=2477921 loops=1)
Hash Cond: (a.id = b.property_id)
-> Seq Scan on property_2 a (cost=0.00..265463.84 rows=6176884 width=105) (actual time=1349.284..696922.438 rows=4272433 loops=1)
-> Hash (cost=48702.76..48702.76 rows=2477776 width=20) (actual time=95497.307..95497.308 rows=2477921 loops=1)
Buckets: 65536 Batches: 64 Memory Usage: 2624kB
-> Seq Scan on property_attribute_2 b (cost=0.00..48702.76 rows=2477776 width=20) (actual time=464.476..94126.890 rows=2477921 loops=1)
Planning time: 4.034 ms
Execution time: 1065995.827 ms
And the query plan after the first run is:
Hash Right Join (cost=670010.33..983391.75 rows=2477776 width=185) (actual time=8828.873..13764.283 rows=2477924 loops=1)
Hash Cond: (c.property_id = a.id)
-> Seq Scan on property_address_2 c (cost=0.00..131660.48 rows=4257948 width=72) (actual time=0.050..1411.877 rows=4257973 loops=1)
-> Hash (cost=595483.13..595483.13 rows=2477776 width=117) (actual time=8826.620..8826.623 rows=2477921 loops=1)
Buckets: 32768 Batches: 128 Memory Usage: 3165kB
-> Hash Join (cost=94193.96..595483.13 rows=2477776 width=117) (actual time=1356.224..7925.850 rows=2477921 loops=1)
Hash Cond: (a.id = b.property_id)
-> Seq Scan on property_2 a (cost=0.00..265463.84 rows=6176884 width=105) (actual time=0.034..2652.013 rows=4272433 loops=1)
-> Hash (cost=48702.76..48702.76 rows=2477776 width=20) (actual time=1354.828..1354.829 rows=2477921 loops=1)
Buckets: 65536 Batches: 64 Memory Usage: 2624kB
-> Seq Scan on property_attribute_2 b (cost=0.00..48702.76 rows=2477776 width=20) (actual time=0.023..630.081 rows=2477921 loops=1)
Planning time: 1.181 ms
Execution time: 13872.977 ms
Also worth noting that I have a couple of other Postgres databases on this machine and different jobs use different tables on these databases on a regular basis.