1

I have table with approximately 10 million rows, with the id column being primary key.

Then I delete all rows where id > 10. Only 10 rows remain in the table.

Now, when I run the query SELECT id FROM tablename, execution time is approximately 1.2 - 1.5 seconds.
But SELECT id FROM tablename where id = x only takes 10 - 11 milliseconds.

Why is the first SELECT so slow for just 10 rows?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Oto Shavadze
  • 40,603
  • 55
  • 152
  • 236
  • Have statistics been updated? – HoneyBadger Dec 15 '15 at 11:52
  • 7
    Did you run `vacuum analyze` after deleting the rows? You might even need to run `vacuum full analyze`. For performance related questions you should always include the execution plan generated with `explain (analyze, verbose, buffers)` –  Dec 15 '15 at 11:53
  • @a_horse_with_no_name I Want accept your comment but I can't find button for this. Thank you – Oto Shavadze Dec 15 '15 at 12:10

1 Answers1

4

The main reason is the MVCC model of Postgres, where deleted rows are kept until the system can be sure that the transaction is not rolled back and the dead rows are not visible to any concurrent transaction any more. Only then, dead rows can be physically removed by VACUUM - or more radically VACUUM FULL.

Related:

Your simple query SELECT id FROM tablename - if run immediately after the DELETE and before autovacuum can kick in - still finds 10 million rows and has to check visibility, only to rule out most of them.

Your second query SELECT id FROM tablename where id = x can use the primary key index and only needs to read a single data page from the (formerly) big table. That kind of query is largely immune to the total size of the table either way.

There may be a (much) more efficient way to delete almost all 10 million rows:

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228