8

I have many read-only tables in a Postgres database. All of these tables can be queried using any combination of columns.

What can I do to optimize queries? Is it a good idea to add indexes to all columns to all tables?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
narduk
  • 964
  • 1
  • 9
  • 19
  • 2
    I'd try to establish where (on which queries) most time is spent, and in the first instance optimize for those. – NPE Sep 15 '11 at 16:15
  • Queries are dynamically generated and the only thing I know is that they are selects with filtering by one or more columns with an orderby clause (and limit/offset). There are no joins. – narduk Sep 16 '11 at 19:09

2 Answers2

17

Columns that are used for filtering or joining (or, to a lesser degree, sorting) are of interest for indexing. Columns that are just selected are barely relevant! For the following query only indexes on a and e may be useful:

SELECT a,b,c,d
FROM   tbl_a
WHERE  a = $some_value
AND    e < $other_value;

Here, f and possibly c are candidates, too:

SELECT a,b,c,d
FROM   tbl_a
JOIN   tbl_b USING (f)
WHERE  a = $some_value
AND    e < $other_value
ORDER  BY c;

After creating indexes (and possibly running ANALYZE or even VACUUM ANALYZE on the table), test to see if they are actually useful with EXPLAIN or EXPLAIN ANALYZE. Also compare execution times with and without the indexes. Deleting and recreating indexes is fast and easy. There are also parameters to experiment with EXPLAIN ANALYZE. The difference may be staggering or nonexistent.
As your tables are read-only, index maintenance is cheap after initial creation. It's merely a question of disk space.

If you really want to know what you are doing, start by reading the docs.

If you don't know what queries to expect ...

  1. Try logging enough queries to find typical use cases. Log queries with the parameter log_statement = all for that. Or just log slow queries using log_min_duration_statement.

  2. Create indexes that might be useful and check the statistics after some time to see what actually gets used. PostgreSQL has a whole infrastructure in place for monitoring statistics. One convenient way to study statistics (and many other tasks) is pgAdmin where you can chose your table / function / index and get all the data on the "statistics" tab in the object browser (main window).

  3. Proceed as described above to see if indexes in use actually speed things up.

  4. If the query planner should chose to use one or more of your indexes but to no or adverse effect then something is probably wrong with your setup and you need to study the basics of performance optimization: vacuum, analyze, cost parameters, memory usage, ...

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Queries are dynamically generated and the only thing I know is that they are selects with filtering by one or more columns with an orderby clause (and limit/offset). There are no joins. Do you consider adding indexes on all columns would be good practice in this case? – narduk Sep 16 '11 at 19:10
  • 2
    Just adding indexes on all columns may or may not help. Indexes do have costs, especially for heavily written tables. Less so for read-only tables. I added a paragraph to my answer above. – Erwin Brandstetter Sep 18 '11 at 15:50
  • Unfortunately, I won't be able to find typical use cases since I won't have access to the server once the project is finished. I wanted to do everything I could to optimize it in the development stage. Anyway, a lot of useful tips. Thanks. – narduk Sep 19 '11 at 11:20
  • Why is it that only indexes on a and e may be useful? – KyelJmD Aug 20 '13 at 15:52
  • @KyelJmD: That's true for the first example and the explanation is given in bold letters: `a` and `e` ar used to **filter** the result. (`WHERE a = $some_value AND e < $other_value`. – Erwin Brandstetter Aug 20 '13 at 16:55
  • How about for the second example? do the indexes of a ,e and f matters? – KyelJmD Aug 20 '13 at 23:00
  • Why is the index of F being used here? is it really good to add indexes on a foreign key? – user962206 Aug 21 '13 at 05:57
  • @user962206: Foreign key columns (on both sides) are *typical* candidates for an index. The referenced column has to be indexed anyway. The referencing column should be indexed, too, in many cases. But an index will only be used if the table is big enough and has enough different values so that only a small percentage of the whole table has to be retrieved. Else, it's faster to just scan the table sequentially. – Erwin Brandstetter Aug 21 '13 at 06:43
  • @KyelJmD: Those columns are typical candidates. Whether an index is used, depends on many details. Test as instructed above. [Consult the manual for basics to start with.](http://www.postgresql.org/docs/current/interactive/indexes-intro.html) – Erwin Brandstetter Aug 21 '13 at 06:46
1

If you have filtering by more columns indexes may help but not too much. Also indexes may not help for small tables.

First search for "postgresql tuning" - you will find usefull information.

If database can fit in memory - buy enough RAM. If database can not fit in memory - SSD will help. If this is not enough and database is read only - run 2, 3 or more servers. Or partition database (in the best case to fit in memory of each server).

Even if queries are generated I think they will not be random. Monitor database for slow queries and improve only them.

jordani
  • 21
  • 1