0

How can I identify the indexes that are worth to set on a sql table?

Take the following as an example:

select * 
from products 
where name = 'car' 
  and type = 'vehicle' 
  and availability > 3 
  and insertion_date > '2015-10-10' 
order by price asc 
limit 1

Imagine a database with a few million entries. Would there be benefits if I set an index on the combination of all attributes that occur in the WHERE and ORDER BY clause?

For the example:

create index i_my_idx on products
   (name, type, availability, insertion_date, price)
membersound
  • 81,582
  • 193
  • 585
  • 1,120
  • 2
    One cannot decide proper indices using a single example query. You need an overview over *all* queries, which are the most frequent and which are the most important. And it also depends on the rest of your setup. How many write operations (and what exactly)? Data distribution, cardinalities, ressources, etc. And you cannot say "sql table". SQL is the query language, a table is a table, a "database table" if you will. – Erwin Brandstetter Dec 18 '14 at 10:28
  • 1
    @a_horse_with_no_name: Actually, as long as there are equality conditions on *all* columns of the index, it doesn't matter *at all* which column comes first. [See this related answer on DBA with a test case.](http://dba.stackexchange.com/questions/33196/multicolumn-index-and-performance/33220#33220) In this case we have range and equality. [*Equality* should go *first*.](http://dba.stackexchange.com/questions/33196/multicolumn-index-and-performance/33220#33220) So the order of columns in the presented index is good. – Erwin Brandstetter Dec 18 '14 at 10:31
  • @a_horse_with_no_name thanks for your detailed comments, I see. So, let's take another example with flights: probably it will be good to add an index to `departure_airport` and `arrival_airport`. Imagine I get a result set of 1k entries. Would there be further benefit if putting eg an index on `departure_date`, whereas for a specific date there might only be like 10 entries? – membersound Dec 18 '14 at 11:06
  • For basic rules of thumb, read the [introduction in the manual](http://www.postgresql.org/docs/current/interactive/indexes-intro.html) and [this related answer](http://stackoverflow.com/questions/7434033/optimize-postgresql-read-only-tables/7438535#7438535). – Erwin Brandstetter Dec 18 '14 at 12:40

2 Answers2

0

There are a few rules of thumb that can be useful when deciding which columns to index:

  1. Make sure there's a unique index on the primary key - this is done automatically when you specify a PK in most RDBMSs including postgresql.
  2. Add indexes for each foreign key. These are created automatically in some RDBMSs when you specify a FK but not in postgresql.
  3. If a PK is a compound key, consider adding indexes on each FK making up the PK (except for the first, which is covered by the PK index). As in 2, some RDBMSs (e.g. MySQL with ISAM) add these indexes automatically when the FKs are specified.

Usually, but not always, table joins in queries will be PF to FK and by having indexes on both keys, the query optimizer of the RDBMS has flexibility in determining the optimum plan for maximum performance. This won't always be the best though, and experienced programmers will often format the SQL for a database query to influence the execution plan for best performance, or decide to omit indexes they know are not needed. It's worth noting that an SQL query that is optimal on one RDBMS is not necessarily optimal on another, or on future versions of the DB server, or as the database grows. The latter is important as in some RDBMSs such as postgres and Oracle, the query execution plans are dependent on the data in the tables (this is known as cost-based optimisation).

Once you've got these out of the way a lot comes down to experience and a knowledge of your data, and importantly, how the data is going to be accessed.

Generally you will be looking to index those columns which are best at filtering the data. In your query above, the obvious one is name. This might be enough to make that query run fast enough (unless all your products are cars).

Other than that it's worth making a list of the common ways the data is likely to be accessed e.g.

  1. Get a list of products that are in a category - an index on category will probably help
  2. However, get a list of products that are currently available - an index on availability will probably not help because a large proportion of products are likely to satisfy this condition.

Unless you are dealing with large amounts of data this can often be all you need to do, and it's not generally a good idea to add indexes "just in case" as there are overheads in maintaining them. But if your system does has performance issues, then it's worth considering how combinations of columns are being used in queries, reading up about the postgres query optimizer etc.

And to answer your last question - possibly, but it's far from the first thing to consider.

Sonic
  • 186
  • 6
  • 1
    "*Make sure there's a unique index on the primary key*" - it's impossible to have no index or a non-unique index on the PK in Postgres –  Dec 18 '14 at 10:50
  • Your "few rules" are no good. 1. is nonsense, the PK is implemented using a unique btree index automatically in Postgres. (To be perfectly clear, Postgres would allow to create another index on the same column(s), unique or not, but it would be pointless, expensive waste. 2. is mostly true. 3. is unfounded. Typically, a multi-column index is good enough, which is created automatically for a PK and makes sense to create in most cases for a FK. If you don't have additional requirements, you do not need additional indexes on parts of the key. – Erwin Brandstetter Dec 18 '14 at 11:12
  • These are rules of thumb for someone learning - you or I might do things differently. – Sonic Dec 18 '14 at 11:15
  • 1
    @Sonic: The rest of the answer is useful, but your leading "few rules" are mostly incorrect and misleading. – Erwin Brandstetter Dec 18 '14 at 12:39
  • I take your point - I've made changes to the first half of the answer, including a new paragraph. – Sonic Dec 18 '14 at 14:43
-2

Well the way you are setting indexes is absolutely correct. Indexes has nothing to do with order by clause.

Some important points while designing SQL query

  1. Always put the condition first in WHERE clause which will filter maximum rows for eg above query name ='car' will filter maximum records in products.

  2. Do not use ">=" use ">" only because greater or equal to will always end up in checking greater first if failed equals as well which will reduce performance of query.

  3. Create a single index in same order your where clause is arranged in.

  4. Try minimizing IN clause use ANY instead.

Thanks Anant

  • Is this postgresql specific advices, or general? – jarlh Dec 18 '14 at 10:04
  • 1
    Also the statement "*Indexes has nothing to do with order by clause*" is also not correct. Indexes _can_ help the database in sorting the result. –  Dec 18 '14 at 10:07