1

I have a 25mln rows "Zemla" table with index

CREATE INDEX zemla_level
  ON public."Zemla"
  USING btree
  (level);

Now I do simple query

select * from "Zemla" where level = 7

and get very hard query plan

Bitmap Heap Scan on "Zemla"  (cost=18316.26..636704.15 rows=978041 width=181) (actual time=216.681..758.663 rows=975247 loops=1)
  Recheck Cond: (level = 7)
  Heap Blocks: exact=54465
  ->  Bitmap Index Scan on zemla_level  (cost=0.00..18071.74 rows=978041 width=0) (actual time=198.041..198.041 rows=1949202 loops=1)
        Index Cond: (level = 7)

and another simple query which should be executed immediately when index present i think

select count(*) from "Zemla" where level = 7

Aggregate  (cost=639149.25..639149.26 rows=1 width=0) (actual time=1188.366..1188.366 rows=1 loops=1)
  ->  Bitmap Heap Scan on "Zemla"  (cost=18316.26..636704.15 rows=978041 width=0) (actual time=213.918..763.833 rows=975247 loops=1)
        Recheck Cond: (level = 7)
        Heap Blocks: exact=54465
        ->  Bitmap Index Scan on zemla_level  (cost=0.00..18071.74 rows=978041 width=0) (actual time=195.409..195.409 rows=1949202 loops=1)
              Index Cond: (level = 7)

My question is why PostgreSQL after first Index Scan does another Bitmap Heap Scan with so much overhead ?

Edit: What is a "Bitmap heap scan" in a query plan? is another question because it answers why some queries with OR operator has bitmap heap scan. My queries haven't neither OR nor AND operator

Community
  • 1
  • 1
alexey2baranov
  • 414
  • 5
  • 15
  • What do you mean with "hard plan"? show us the execution plan generated using `explain (analyze)` instead a plain explain. –  Mar 15 '17 at 15:44
  • fixed question body to explain (analyze) – alexey2baranov Mar 15 '17 at 16:25
  • 1
    The row estimates are quite off. Does running `vacuum analyze "Zemla"` change anything? –  Mar 15 '17 at 16:38
  • Possible duplicate of [What is a "Bitmap heap scan" in a query plan?](http://stackoverflow.com/questions/6592626/what-is-a-bitmap-heap-scan-in-a-query-plan) – Laurenz Albe Mar 15 '17 at 16:48

1 Answers1

1

If I am not mistaken, the bitmap Heap Scan is the algorithm fetching the data from the disk. It analyses all the disk-pages the engine has to fetch and sort them for minimal hard-drive head movement.

It takes time because your table must be very large and maybe highly fragmented on the disk.


For your second query count(*), PostgreSQL will still need to read the resulting rows to verify that they exist; other database systems may only need to reference the index in this situation. Check this page for more information:

https://wiki.postgresql.org/wiki/Index-only_scans


Try a VACCUM FULL on the table, and see if it speeds things up.

Guillaume F.
  • 5,905
  • 2
  • 31
  • 59