1

I have a Rails app on a Postgres database which relies heavily on queries like this:

SELECT DISTINCT client_id FROM orders WHERE orders.total>100

I need, essentially, the ids of all the clients who have orders which meet a certain condition. I only need the id, so I figured this is way faster than using joins.

Would I benefit from adding an index to the column "total"? I don't mind insert speed, I just need the query to run extremely fast.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Nicolas
  • 2,297
  • 3
  • 28
  • 40
  • As *always*: PostgreSQL version? Table definition? – Erwin Brandstetter Dec 09 '12 at 19:41
  • Postgres 9.1.3 on Heroku. What do you mean by table definition? – Nicolas Dec 09 '12 at 19:54
  • That's a table definition: `CREATE TABLE foo (id integer ...)`. I am asking, because data types and primary key and other stuff matter, – Erwin Brandstetter Dec 09 '12 at 19:56
  • Oh I see. "orders" is a float. I dont really have the table definition cause the question is a made up example of something similar but much more complicated. I realised that I needed a multicolumn indices more than I thought, because queries have complicated conditions over more fields together, not just "orders". However, I created the indices and got a 10x performance increase! Thank you very much – Nicolas Dec 09 '12 at 20:05
  • 1
    Remember to weigh the cost of specialized indices against their benefit. You wouldn't create a separate multicolumn index for every rarely used query. PostgreSQL *can* combine indexes with pretty good performance. – Erwin Brandstetter Dec 09 '12 at 20:45

3 Answers3

4

I would expect the following multicolumn index to be fastest:

CREATE INDEX orders_foo_idx ON orders (total DESC, client_id);

PostgreSQL 9.2 could benefit even more. With it's "index-only tuples" feature, it could serve the query without hitting the table under favorable circumstances: no writes since the last VACUUM.

DESC or ASC hardly matters in this case. A B-tree index can be searched in both directions almost equally efficient.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
1

Absolutely. With no index on the total column, this query will require a table scan. With an index on the total column, it will require an index seek and key lookup. This will provide your query with huge performance gains as the size of the table grows.

  • 1
    Whether or not the index will be used depends on how many rows qualify for `amount > 100`. If more that ~20% of the rows are returned by the condition, then the index won't be used. –  Dec 09 '12 at 19:43
  • Thanks, actually VERY few rows meet each condition. – Nicolas Dec 09 '12 at 19:49
1
>  I only need the id, so I figured this is way faster than using joins.

True, though I'm not sure why you would consider using joins in the first place in this case.

As cmotley said, you're going to require an index on the total column for this query. However, optimal performance is going to depend on exactly which queries you're running. For example, for this query, with this table structure, the fastest you're going to get is to create an index like so:

CREATE INDEX IX_OrderTotals ON orders (total, client_id)

By including the client_id in the index, you create something called a covered index on the client_id column, so the database engine won't have to look up the row behind the scenes in order to fetch your data.

Community
  • 1
  • 1
Dave Markle
  • 95,573
  • 20
  • 147
  • 170