50

I could not reach any conclusive answers reading some of the existing posts on this topic.

I have certain data at 100 locations the for past 10 years. The table has about 800 million rows. I need to primarily generate yearly statistics for each location. Some times I need to generate monthly variation statistics and hourly variation statistics as well. I'm wondering if I should generate two indexes - one for location and another for year or generate one index on both location and year. My primary key currently is a serial number (Probably I could use location and timestamp as the primary key).

Thanks.

let_there_be_light
  • 837
  • 3
  • 9
  • 15

3 Answers3

48

Regardless of how many indices have you created on relation, only one of them will be used in a certain query (which one depends on query, statistics etc). So in your case you wouldn't get a cumulative advantage from creating two single column indices. To get most performance from index I would suggest to use composite index on (location, timestamp).

Note, that queries like ... WHERE timestamp BETWEEN smth AND smth will not use the index above while queries like ... WHERE location = 'smth' or ... WHERE location = 'smth' AND timestamp BETWEEN smth AND smth will. It's because the first attribute in index is crucial for searching and sorting.

Don't forget to perform

ANALYZE;

after index creation in order to collect statistics.

Update: As @MondKin mentioned in comments certain queries can actually use several indexes on the same relation. For example, query with OR clauses like a = 123 OR b = 456 (assuming that there are indexes for both columns). In this case postgres would perform bitmap index scans for both indexes, build a union of resulting bitmaps and use it for bitmap heap scan. In certain conditions the same scheme may be used for AND queries but instead of union there would be an intersection.

Ildar Musin
  • 1,290
  • 1
  • 11
  • 11
  • 21
    About `only one of them will be used in a certain query` actually Postgres does have the ability to use multiple indexes to satisfy a single query. Check this chapter in the manual: https://www.postgresql.org/docs/current/indexes-bitmap-scans.html Specifically `Fortunately, PostgreSQL has the ability to combine multiple indexes` – Daniel Mar 10 '19 at 19:04
  • 1
    What if I have multicolumn B-tree index `(location, timestamp)`, and I use one of them in `where` closure. For example `where location = smth` or `timestamp = smth` will the index I have created work? – Mukhammadsher Jun 15 '19 at 17:49
  • awesome answer! – John Apr 30 '20 at 16:39
11

There is no rule of thumb for situations like these, I suggest you experiment in a copy of your production DB to see what works best for you: a single multi-column index or 2 single-column indexes.

One nice feature of Postgres is you can have multiple indexes and use them in the same query. Check this chapter of the docs:

... PostgreSQL has the ability to combine multiple indexes ... to handle cases that cannot be implemented by single index scans ....

... Sometimes multicolumn indexes are best, but sometimes it's better to create separate indexes and rely on the index-combination feature ...

You can even experiment creating both the individual and combined indexes, and checking how big each one is and determine if it's worth having them at the same time.

Some things that you can also experiment with:

  • If your table is too large, consider partitioning it. It looks like you could partition either by location or by date. Partitioning splits your table's data in smaller tables, reducing the amount of places where a query needs to look.
  • If your data is laid out according to a date (like transaction date) check BRIN indexes.
  • If multiple queries will be processing your data in a similar fashion (like aggregating all transactions over the same period, check materialized views so you only need to do those costly aggregations once.

About the order in which to put your multi-column index, put first the column on which you will have an equality operation, and later the column in which you have a range, >= or <= operation.

Daniel
  • 21,933
  • 14
  • 72
  • 101
2

An index on (location,timestamp) should work better that 2 separate indexes for you case. Note that the order of the columns is important.

redneb
  • 21,794
  • 6
  • 42
  • 54
  • Thank you. How does the order of the columns effects performance? – let_there_be_light Sep 02 '16 at 18:21
  • 1
    If your search criteria is like `location="something" AND timestamp BETWEEN something AND something`, then an index on `(location,timestamp)` can be used for that, whereas an index on `(timestamp,location)` could not. – redneb Sep 02 '16 at 18:28
  • 1
    Not true. Order of where conditions is not important at all. See here https://dba.stackexchange.com/a/115759 – T3rm1 Jul 04 '18 at 15:05
  • 8
    @T3rm1 I am referring to the order of columns in the definition of a multicolumn index. This is totally different than what your link is about, which is about the order of conditions in a `WHERE` clause. In the former case the order matters, in the latter it doesn't. – redneb Jul 05 '18 at 16:50