2

I'm querying a PostgreSQL table for rows in a range of days. BETWEEN is a common recommendation:

WHERE ts BETWEEN '2015-09-01' AND '2015-10-01'

I've also seen EXTRACT used:

WHERE EXTRACT(month from ts) = 9

Assuming ts is an indexed timestamp column, can EXTRACT match the speed of BETWEEN? If so, I'm especially curious how it's implemented.

Community
  • 1
  • 1
hurrymaplelad
  • 26,645
  • 10
  • 56
  • 76
  • It cannot, since the `EXTRACT(day from ts)` is not indexed. Take the address book and find all people whose second letter in the first name is `a` (eg `Daniel`) – zerkms Sep 22 '15 at 01:14

1 Answers1

9

A b-tree index, which is the default index type, can be used for operations >, <, >=, <= and = i.e. ordering and equality.

When you use a function to transform the output in a way that transforms the value so it can't be looked up literally in the index then you can't use the index for a lookup.

This is the case with EXTRACT(day from ts). You can't do a b-tree lookup on the transformed column. Theoretically PostgreSQL could still do a b-tree index lookup if it could prove that the expression preserved ordering for all cases, as is true for extract(epoch from ts) ... but at the moment the optimizer doesn't know how to do that. Even if it could, extract(day from ...) doesn't preserve ordering of inputs, in that a "lower" timestamp can have a higher "day", so a b-tree lookup cannot be used.

ts BETWEEN a AND b is just a shorthand for writing ts >= a and ts <= b. Both are b-tree indexable operations.

You can create a new b-tree index on an expression, like

create index mytable_ts_day on mytable(extract(day from ts))

and this index then can be used for a matching expression like extract(day from ts) = 9, or any other operator on the expression that's b-tree indexable. However, it means you have to maintain another index, which has a cost for each insert and update performed, and competes for cache space.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • 1
    that totally answers my question. Part of me hoped Postgres could optimize an EXTRACT expression, perhaps even transform it to combinations of <= and >=. Sounds like an opportunity to contribute :) – hurrymaplelad Sep 22 '15 at 04:25
  • @hurrymaplelad I think it's possible for functions that're linear mappings to another space, yes, such that for any `a <= x <= b`, `f(a) <= f(x) <= f(b)`. You'd need a way to annotate such functions to mark them. Then it'd be possible to call `f` on each index key and compare it to the operator's other input. So you'd have to teach the query optimizer (or even rewrite engine) to make the transformation. This would not help your current query, though, since `extract(day from ts)` is not such a transformation (it gets the day of month). `extract(epoch ...)` would be OK or `date_trunc('day', ts)` – Craig Ringer Sep 22 '15 at 04:35