Why do seq/index scans take so long when running query after a while? How to make it fast?

Question

Problem:

I have a query that joins three tables. Whenever I run this query after a while (say 24hrs), it would take a lot of time to execute. But from that time onwards, it would execute really fast (~ 70x faster). I wanted to know what's the problem that it takes so long to execute for the first time, and how to solve it.

Table conditions:

The tables are: property_2, property_attribute_2, and property_address_2. Each of which is a partition of a bigger table (i.e. property, property_attribute, and property_address). Also, rows in property_attribute_2 and property_address_2 have reference key to property_2 using column property_id. These columns (property.id, property_attribute_2.property_id, and property_address_2.property_id) are all indexed.

The query is:

select * from public.property_2 a 
inner join public.property_attribute_2 b on a.id = b.property_id 
left join public.property_address_2 c on a.id=c.property_id

The query plan when I run it after a while is:

Hash Right Join  (cost=670010.33..983391.75 rows=2477776 width=185) (actual time=804159.499..1065892.338 rows=2477924 loops=1)
  Hash Cond: (c.property_id = a.id)
  ->  Seq Scan on property_address_2 c  (cost=0.00..131660.48 rows=4257948 width=72) (actual time=289.781..247906.955 rows=4257973 loops=1)
  ->  Hash  (cost=595483.13..595483.13 rows=2477776 width=117) (actual time=803833.183..803833.185 rows=2477921 loops=1)
        Buckets: 32768  Batches: 128  Memory Usage: 3165kB
        ->  Hash Join  (cost=94193.96..595483.13 rows=2477776 width=117) (actual time=98061.326..802753.642 rows=2477921 loops=1)
              Hash Cond: (a.id = b.property_id)
              ->  Seq Scan on property_2 a  (cost=0.00..265463.84 rows=6176884 width=105) (actual time=1349.284..696922.438 rows=4272433 loops=1)
              ->  Hash  (cost=48702.76..48702.76 rows=2477776 width=20) (actual time=95497.307..95497.308 rows=2477921 loops=1)
                    Buckets: 65536  Batches: 64  Memory Usage: 2624kB
                    ->  Seq Scan on property_attribute_2 b  (cost=0.00..48702.76 rows=2477776 width=20) (actual time=464.476..94126.890 rows=2477921 loops=1)
Planning time: 4.034 ms
Execution time: 1065995.827 ms

And the query plan after the first run is:

Hash Right Join  (cost=670010.33..983391.75 rows=2477776 width=185) (actual time=8828.873..13764.283 rows=2477924 loops=1)
  Hash Cond: (c.property_id = a.id)
  ->  Seq Scan on property_address_2 c  (cost=0.00..131660.48 rows=4257948 width=72) (actual time=0.050..1411.877 rows=4257973 loops=1)
  ->  Hash  (cost=595483.13..595483.13 rows=2477776 width=117) (actual time=8826.620..8826.623 rows=2477921 loops=1)
        Buckets: 32768  Batches: 128  Memory Usage: 3165kB
        ->  Hash Join  (cost=94193.96..595483.13 rows=2477776 width=117) (actual time=1356.224..7925.850 rows=2477921 loops=1)
              Hash Cond: (a.id = b.property_id)
              ->  Seq Scan on property_2 a  (cost=0.00..265463.84 rows=6176884 width=105) (actual time=0.034..2652.013 rows=4272433 loops=1)
              ->  Hash  (cost=48702.76..48702.76 rows=2477776 width=20) (actual time=1354.828..1354.829 rows=2477921 loops=1)
                    Buckets: 65536  Batches: 64  Memory Usage: 2624kB
                    ->  Seq Scan on property_attribute_2 b  (cost=0.00..48702.76 rows=2477776 width=20) (actual time=0.023..630.081 rows=2477921 loops=1)
Planning time: 1.181 ms
Execution time: 13872.977 ms

Also worth noting that I have a couple of other Postgres databases on this machine and different jobs use different tables on these databases on a regular basis.

Erwin Brandstetter · Accepted Answer · 2021-04-10T13:53:23.063

If cold cache is the problem, as it seems to be the case, you can warm it up before running the query. Postgres ships with the additional module pg_prewarm providing a range of tools to populate the cache.

Instructions how to set it up here:

PostgreSQL: Force data into memory

Then you run something like:

SELECT pg_prewarm('public.property_2', 'prefetch');
SELECT pg_prewarm('public.property_attribute_2', 'prefetch');
SELECT pg_prewarm('public.property_address_2', 'prefetch');

Of course, if you always run the same SELECT query without filter predicates, you might as well just run the same query to populate the cache, without using the fancy module. Possibly scheduled with a cron job?

... are all indexed.

As you can see in the EXPLAIN output, your indexes go unused. You fetch all rows without filter predicate, so indexes typically won't help. And you do it with SELECT *, i.e. get all columns from all joined tables, so index-only scans are out, too. You might improve performance by listing only the columns you actually need in the SELECT list.

Obviously, more RAM (and proper configuration for PostgreSQL buffer cache) can help, too.

Or you might be able to reduce RAM requirements with VACUUM (FULL) or, possibly, with an optimized table definition with proper column types and order. Not just for the tables at hand, also for other tables competing for the same resources (thereby evicting "your" blocks from the cache). See:

Calculating and saving space in PostgreSQL

Thank you so much. Yes, the cold cache was the problem and it solved using the pg_prewarm. For more information, the tables are partitioned based on the required filters. For example, each partition stands for a city. So no more filters required. Also, I don't use `select *`, here I just removed the column names to be more specific about the problem. — Amir Moghadam, Apr 11 '21 at 06:03

score 0 · Answer 2 · answered Apr 10 '21 at 09:35

0

The difference must be caching: the first time, the data are read from disk, in subsequent runs they are found in RAM. Run EXPLAIN (ANALYZE, BUFFERS) with track_io_timing = on to confirm that.

However, it seems that either your I/O system is really slow or your tables are quite bloated. EXPLAIN (ANALYZE, BUFFERS) would show how many blocks are read, so you would know.

If bloat is indeed your problem, VACUUM (FULL) would help.

answered Apr 10 '21 at 09:35

Laurenz Albe

209,280
17
206
263

I already guessed that the difference arises from caching. But I'm looking for a solution to run the first query faster. Because it would be executed through a pipeline once/twice a month. Is there anything I can do to decrease the run time of the first execution? e.g. by cashing the data beforehand. Meanwhile, the buffers and table structure seem good. Shared hit increases with each execution and I/O timing decreases. And the table is not bloated. – Amir Moghadam Apr 10 '21 at 10:22

Why do seq/index scans take so long when running query after a while? How to make it fast?

2 Answers2