What is the right way to index a postgres table when doing a query with two fields?

Question

If I have a large table with:

varchar foo
integer foo_id
integer other_id
varchar other_field

And I might be doing queries like:

select * from table where other_id=x

obviously I need an index on other_id to avoid a table scan.

If I'm also doing:

select * from table where other_id=x and other_field='y'

Do I want another index on other_field or is that a waste if I never do:

select * from table where other_field='y'

i.e. I only use other_field with other_id together in a query.

Would a compound index of both [other_id, other_field] be better? Or would that cause a table scan for the 1st simple query?

Pavel Horal · Accepted Answer · 2014-10-27T12:44:48.503

Use EXPLAIN and EXPLAIN ANALYZE, if you are not using these two already. Once you understand query plan basics you'll be able to optimize database queries pretty effectively.

Now to the question - saying anything without knowing a bit about the values might be misleading. If there are not that many other_field values for any specific other_id, then a simple index other_id would be enough. If there are many other_field values (i.e. thousands), I would consider making the compound index.

Do I want another index on other_field or is that a waste if I never do:

Yes, that would be very probably waste of space. Postgres is able to combine two indexes, but the conditions must be just right for that.

Would a compound index of both [other_id, other_field] be better?

Might be.

Or would that cause a table scan for the 1st simple query?

Postgres is able to use multi-column index only for the first column (not exactly true - check answer comments).

The basic rule is - get a real data set, prepare queries you are trying to optimize. Run EXPLAIN ANALYZE on those queries. Try to rewrite them (i.e. joins instead of subselects or vice versa) and check the performance (EXPLAIN ANALYZE). Try to add indexes where you feel it might help and check the performance (EXPLAIN ANALYZE)... if it does not help, don't forget to drop the unnecessary index.

And if you are still having problems and your data set is big (tens of millions+), you might need to reconsider even running specific queries. A different approach might be needed (e.g. batch / async processing) or a different technology for the specific task.

thanks, this helped a lot. I will mark this answer the best when SO lets me. — Andrew Arrow, Oct 23 '14 at 21:37
Pretty much rewritten the answer. There were some mistakes :). — Pavel Horal, Oct 23 '14 at 21:43
"*Postgres is able to use multi-column index only for the first column*" - that's not entirely true: http://stackoverflow.com/q/26503743/330315 — , Oct 24 '14 at 16:20
Interesting, didn't know about this. Added comment to the mentioned sentence. Thank you. — Pavel Horal, Oct 27 '14 at 12:46

score 0 · Answer 2 · answered Oct 23 '14 at 21:52

If other_id is highly selective, then you might not need an index on other_field at all. If only a few rows match other_id=x in the index, looking at each of them to see if they also match other_field=y might be fast enough to not bother with more indexes.

If it turns out that you do need to make the query faster, then you almost surely want the compound index. The stand alone index on other_field is unlikely to help.

Branko Dimitrijevic · Answer 3 · 2014-10-25T21:25:11.537

The accepted answer is not entirely accurate - if you need all three queries mentioned in your question, then you'll actually need two indexes.

Let's see which indexes satisfy which WHERE clause in your queries:

                               {other_id} {other_id, other_field} {other_field, other_id} {other_field}
other_id=x                     yes        yes                     no                      no
other_id=x and other_field='y' partially  yes                     yes                     partially
other_field='y'                no         no                      yes                     yes

So to satisfy all 3 WHERE clauses, you'll need:

either an index on {other_id} and a composite index on {other_field, other_id}
or an index on {other_field} and a composite index on {other_id, other_field}
or a composite index on {other_id, other_field} and a composite index on {other_field, other_id}.¹

Depending on distribution of your data, you could also get away with {other_id} and {other_field}, but you should measure carefully before opting for that solution. Also, you may consider replacing * with a narrower set of fields and then covering them by indexes, but that's a whole other topic...

¹ "Fatter" solution than the other two - consider only if you have specific covering needs.

The op states that the third query is never used. Making the accepted answer correct — MatBailie, Oct 25 '14 at 21:34

What is the right way to index a postgres table when doing a query with two fields?

3 Answers3