2

I'm wondering if anyone has any recommendations on how to optimize this query so it can use an index? We have reservations that are associated to properties in our database. Each property has a timezone set. What we want to query for is "all reservations that start on a given date (e.g. today)".

The relevant query fragment is

reservations.start_on::Date = COALESCE((current_timestamp at time zone properties.time_zone), current_timestamp)::Date

The properties table is joined earlier in the query as you would expect.

The problem is, as you probably guessed, timezones. We can't simply ask for start_on = {date} because when we say {date} we really mean {date in that property's timezone} which may not actually be {date} based on the current time. {date} will come in as, e.g. 4/7/2014 based on the timezone of our App server but we actually want to include reservations that start on 4/8/2014 if the reservation's property is in Sydney Australia.

---- EDIT ----

Some additional information.

reservations.start_on is stored as type date.

properties.time_zone is stored as character varying(255), default 'America/Los_Angeles'::character varying.

The reservation's start_on date is stored as the date were you in the timezone of the property. The part that needs to be interpreted is the date we are querying for to shift it to the appropriate date in the various property timezones in most cases.

I'm certainly open to other approaches but this was all I could think to do so far... Basically I want to be able to ask a few questions of the DB:

  1. What reservations are starting "today"?
  2. What reservations have start dates in the future?
  3. What reservations have start dates in the past?

Start date, being a date though, is obviously actually a time range so you can't just say start_on > now() for example because timezones.

Table Definitions: https://gist.github.com/anonymous/10295380

---- EDIT 2 ----

I tried switching over to using tsrange for start_on. The range is time shifted from the property's timezone to UTC. That means a start_on of 4/11/2014 in Sydney is stored as ['4/10/2014 14:00:00', '4/11/2014 13:59:59']. I have a gist index on the tsrange version of the start_on column.

This seems to work perfectly and return the right results. It uses the gist index for @> queries with a specific time and is blazing fast. Unfortunately it is NOT using the gist index for all range operations...

Here is a gist (haha) with example EXPLAINs on some of our queries: https://gist.github.com/bdmac/10496601

The column shown in the examples, arrival_day is equivalent to start_on but is a tsrange. I have not removed the old start_on column yet.

Some of the queries that are now slow are "upcoming" or "past" reservations where I had to construct a tsrange with no upper or lower bound depending. I can't seem to figure out how to do a query with << or >> that accepts an element instead of a range like you can do with @>

  • The table definition should be in the question, showing details about involved columns (`\d tbl` in psql). And your Postgres version. – Erwin Brandstetter Apr 07 '14 at 18:57
  • Thanks, I updated the question with some more info and linked to the table definitions for properties and reservations. – Brian McManus Apr 09 '14 at 17:37
  • So far the only thing I can think to do is to store a start_on_utc column that pre-calculates the date relative to UTC such that start_on may not always equal start_on_utc. Then use start_on_utc for these queries. The downside here is that if a property is edited to adjust its timezone (for example it was misconfigured) then we would need to cascade that change through to all the reservations at that property and recalculate start_on_utc. – Brian McManus Apr 09 '14 at 18:03
  • If `reservations.start_on` is of type `date`, why do you *cast* it to `date` in the example? Also, are you interested in "today" according to the current time zone or according to the time zone of the respective entry? – Erwin Brandstetter Apr 09 '14 at 20:20
  • Timezone of the entity. – Brian McManus Apr 11 '14 at 19:06
  • You may be overthinking this. You do *not* need a `tsrange` for the original question. You need a [`timestamptz`](http://www.postgresql.org/docs/current/interactive/rangetypes.html) (the absolute time). Or if you need ranges, make that a **`tstzrange`**. If you need to know *where* it happens (the *local* date of the event depends on that - as opposed to the date of the current time zone, which does not depend on it), store the *time zone* additionally. All I need is a crystal clear definition what you mean by **"today"** to provide an answer. Clean up your question. – Erwin Brandstetter Apr 11 '14 at 22:28
  • Take a look at the answer in this question: http://dba.stackexchange.com/questions/151771/postgresql-at-time-zone-construct-does-not-use-index – yglodt Oct 10 '16 at 18:41

2 Answers2

3

If you actually have a column of type timestamp and interpret it (in parts) depending on the current time zone, and this time zone can vary, then an index is generally impossible. You can only build an index on IMMUTABLE data ...

After Update:

To answer these questions:

  1. What reservations are starting "today"?
  2. What reservations have start dates in the future?
  3. What reservations have start dates in the past?

... you best store a timestamp with time zone. Just a date is not precise enough.

As long as we are only interested in the local "today" (as defined by the current time zone), we do not need to save the time zone explicitly. We do not care where in the world it happens, we only need an absolute time to compare to.

Then, to get reservations starting "today" simply:

SELECT *
FROM   reservations
WHERE  start_on::date = current_date;

But this is not sargable because start_on::date is a derived expression and we cannot build a functional index for this, either, (without dirty tricks) because the expression depends on the current time zone and is not IMMUTABLE.

Instead, compare to the start and end of "our" day in UTC time:

SELECT *
FROM   reservations
WHERE  start_on >= current_date::timestamptz
AND    start_on < (current_date + 1)::timestamptz; -- exclude upper border

Now, this simple index can support the query:

CREATE INDEX ON reservations (start_on);

Demo

SQL Fiddle is down ATM. Here is a little demo to help understanding:

CREATE TEMP TABLE reservations (
   reservation_id serial
 , start_on timestamptz NOT NULL
 , time_zone text);    -- we don't need this

INSERT INTO reservations (start_on, time_zone) VALUES
  ('2014-04-09 01:00+02', 'Europe/Vienna')
, ('2014-04-09 23:00+02', 'Europe/Vienna')
, ('2014-04-09 01:00+00', 'UTC')    -- the value is independent of the time zone
, ('2014-04-09 23:00+00', 'UTC')    -- only display depends on current time zone
, ('2014-04-09 01:00-07', 'America/Los_Angeles')
, ('2014-04-09 23:00-07', 'America/Los_Angeles');

SELECT start_on, time_zone 
     , start_on::timestamp             AS local_ts
     , start_on AT TIME ZONE time_zone AS ts_at_tz
     , current_date::timestamptz       AS lower_bound
     , (current_date + 1)::timestamptz AS upper_bound
FROM   reservations
WHERE  start_on >= current_date::timestamptz
AND    start_on < (current_date + 1)::timestamptz;

More explanation and links here:
Ignoring timezones altogether in Rails and PostgreSQL

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • It looks like you're suggesting to store the start_on as a timestamptz? The thing I struggle with is that a date is actually a range of times. I actually tried a version that used a start_on_utc which was actually stored as a timestamp (w/out zone because Rails seems to expect everything in Postgres to be w/out zone). This did not actually end up working at all though and the queries were returning incorrect results. This may have been because I was trying to use start_on_utc = date instead of checking between two times. What "time" do you use for start_on in your example? Midnight? – Brian McManus Apr 11 '14 at 19:48
  • I tried to come up with a better solution to this using tsrange for start_on (converted to UTC tsrange based on property's timezone). This mostly seems to work great but a few queries are just ignoring the gist index now. Will add to question. – Brian McManus Apr 11 '14 at 19:51
  • @BrianMcManus: The link in the last line of my answer should be a good start. – Erwin Brandstetter Apr 11 '14 at 19:59
  • It is but still stuck. Have a branch where I'm trying this and tests don't pass. I'm using a `arrival_day_start` timestamp w/out zone and storing start_on @ midnight shifted to UTC. If we pretend we are running the query on 4/10 at midnight then `current_date::timestamp` is '2014-04-10 00:00:00' because it's still 4/10 in UTC. The problem is that a reservation that starts on 4/10 in Sydney has 4/10 midnight shifted to 4/9 14:00 UTC so it doesn't show up in the results. That's why I switched to ranges but maybe I'm still missing something. – Brian McManus Apr 11 '14 at 21:08
  • I created a new gist showing where the problem with your solution shows up: https://gist.github.com/bdmac/10505604 – Brian McManus Apr 11 '14 at 22:11
0

Take a look at my answer here, there is a workaround which could also work for you:

https://dba.stackexchange.com/questions/151771/postgresql-at-time-zone-construct-does-not-use-index/151776#151776

Community
  • 1
  • 1
yglodt
  • 13,807
  • 14
  • 91
  • 127