1

When creating indexes on PostgreSQL tables, EXPLAIN ANALYZE followed by an SQL command shows which indexes are used.

For example:

EXPLAIN ANALYZE SELECT A,B,C FROM MY_TABLE WHERE C=123;

Returns:

Seq Scan on public.my_table (cost=...) <- No index, BAD

And, after creating the index, it would return:

Index Scan using my_index_name on public.my_table (cost=...) <- Index, GOOD

However, for some queries that used the same index with a few hundred records, it didn't make any difference. Reading through documentation, it is recommended that either run ANALYZE or have the "Autovacuum" daemon on. This way the database would know the size of tables and decide on query plans properly.

is this absolutely necessary in a production environment? In other words, will PostgreSQL use the index when it's time to use it without need to analyse or vacuum as an extra task?

Zero Distraction
  • 1,286
  • 3
  • 17
  • 24
  • The answer is: yes. On a table with just a few hundred rows it's not uncommon that a table scan is quicker than an index scan (especially if you are returning most of the rows). Unless you have some very uneven data distribution the optimizer will pick the best plan in most of the cases. See also my answer here: http://stackoverflow.com/a/5203827/330315 If you are (absolutely) sure a query would benefit from an index but the planner isn't picking it, then please post *that* query [including all relevant information](http://wiki.postgresql.org/wiki/SlowQueryQuestions) –  Sep 19 '13 at 07:47
  • Thanks for the comment. The query has a simple equals filter on a numeric column and no joins, like the example above. I usually keep things simple in the DB. But it Would be interesting to compare that "explain plan" index scare with other databases, I really did have to check a few times if the column name was right! Statistics collection seems to be enabled by default so I guess this question is not that much relevant in reality. – Zero Distraction Sep 19 '13 at 08:01
  • Yes, the default settings for auto vacuum and statistics collection should be good enough to start with. Again: if you have question about a specific query, please post it (maybe as a new question) –  Sep 19 '13 at 08:03

1 Answers1

1

Short answer "just run autovacuum." Long answer... yes, because statistics can get out of date.

Let's talk about indexes and how/when PostgreSQL decides to use them.

PostgreSQL gets a query in, parses it, and then begins the planning process. How are we going to scan the tables? How are we going to join them and in what order? These are not trivial decisions and trying to find the generally best ways to do things typically means that PostgreSQL needs to know something about the tables.

The first thing to note is that indexes are not always a win. No plan ever beats a sequential scan through a one-page table, and even a 5 page table will almost always be faster with a sequential scan than an index scan. So PostgreSQL cannot safely decide to "use all available indexes."

So the way PostgreSQL decides whether to use an index is to check statistics. Now, these go out of date, which is why you want autovacuum to be updating them. You say your table has a few hundred records and the statics were probably out of date. If PostgreSQL cannot say that the index is a win, it won't use it. A few hundred records is going to be approaching "an index might help" territory depending on how selective the index is in weeding out records.

In your large table, there was probably no question based on existing statistics that the index would help. In your smaller table, there probably was a question and it got answered one way based on the stats it had, and a different way based on newer stats.

Chris Travers
  • 25,424
  • 6
  • 65
  • 182
  • Thanks for you answer. When I read the docos, it did look like "Vacuuming" was an extra thing, but it turns out that's the default behaviour. Cheers – Zero Distraction Nov 20 '13 at 00:45
  • 1
    It was made the default behavior I think in 8.3 iirc. Sometime in 8.x. Before that it had to be configured, an before that it had to be run manually. – Chris Travers Nov 20 '13 at 07:31