8

I want to improve this slow query, I thinking to add an index, but I don't know what index type is better to my case.

SELECT COUNT(*) ct FROM events
WHERE dtt AT TIME ZONE 'America/Santiago'
   >= date(now() AT TIME ZONE 'America/Santiago') + interval '1s'  

Query Plan:

"Aggregate  (cost=128032.03..128032.04 rows=1 width=0) (actual time=3929.083..3929.083 rows=1 loops=1)"
"  ->  Seq Scan on events  (cost=0.00..125937.68 rows=837742 width=0) (actual time=113.080..3926.972 rows=25849 loops=1)"
"        Filter: (timezone('America/Santiago'::text, dtt) >= (date(timezone('America/Santiago'::text, now())) + '00:00:01'::interval))"
"        Rows Removed by Filter: 2487386"
"Planning time: 0.179 ms"
"Execution time: 3929.136 ms"
  • The query gets the count of events of the day.
  • dtt is a timestamp with time zone column.
  • I'm using Postgresql 9.4.

Note: With the Erwin advices the query run a little faster but still I think isn't fast enough.

"Aggregate  (cost=119667.76..119667.77 rows=1 width=0) (actual time=3687.151..3687.152 rows=1 loops=1)"
"  ->  Seq Scan on vehicle_events  (cost=0.00..119667.14 rows=250 width=0) (actual time=104.635..3687.068 rows=469 loops=1)"
"        Filter: (dtt >= timezone('America/Santiago'::text, date_trunc('day'::text, timezone('America/Santiago'::text, now()))))"
"        Rows Removed by Filter: 2513337"
"Planning time: 0.164 ms"
"Execution time: 3687.204 ms"
Goku
  • 1,750
  • 5
  • 23
  • 35
  • 1
    Nitpicking, your query does ***not*** "get the count of events of the day". It gets the count of *all* events since the beginning of the day - including future timestamps if there are some, which might easily happen with multiple time zones. – Erwin Brandstetter Aug 17 '15 at 04:44

1 Answers1

11

First, fix your query to make the predicate "sargable":

SELECT count(*) AS ct
FROM   events
WHERE  dtt >= date_trunc('day', now() AT TIME ZONE 'America/Santiago')
                                      AT TIME ZONE 'America/Santiago';

Use the column value as is and move all calculations to the parameter.

That's right, after truncating to the local start of the day, apply AT TIME ZONE a second time to convert the timestamp value back to timestamptz again. See:

Explanation step-by-step

  1. now()
    .. is the Postgres implementation for the SQL standard CURRENT_TIMESTAMP. Both are 100 % equivalent, you can use either. It returns the current point in time as timestamptz - the display of the value takes the time zone of the current session into consideration, but that's irrelevant for the value.

  2. now() AT TIME ZONE 'America/Santiago'
    .. computes the local time for the given time zone. The resulting data type is timestamp. We do this to allow for:

  3. date_trunc( now() AT TIME ZONE 'America/Santiago' )
    .. truncates the time component to get the local start of the day in 'America/Santiago', independent of the current time zone setting.

  4. date_trunc('day', now() AT TIME ZONE 'America/Santiago') AT TIME ZONE 'America/Santiago'
    .. feeding the timestamp to the AT TIME ZONE construct we get the corresponding timestamptz value (UTC internally) to compare the timestamptz value dtt to.

I removed the + interval '1s', suspecting you have just been abusing that to convert the date to timestamp. Use date_trunc() instead to produce a timestamp value.

Now, a plain (default) btree index on dtt will do. Of course, the index will only be used, if the predicate is selective enough.

CREATE INDEX events_dtt_idx ON events (dtt);

If your important queries only consider recent rows, a partial index might help some more. Details:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thanks Erwin for your answer, I have now a better result, but the query still is slow, is needed do a reindex to my table? – Goku Aug 17 '15 at 04:30
  • @Goku: `REINDEX` is only necessary if the index is terribly bloated or corrupt. `VACUUM ANALYZE events` might help to give you an [index-only scan](https://wiki.postgresql.org/wiki/Index-only_scans). If you can't get index-only scans due to too much writing activity, clustering the table might make it faster. [Details here](http://stackoverflow.com/questions/13998139/optimize-postgres-timestamp-query-range/14007963#14007963) – Erwin Brandstetter Aug 17 '15 at 04:39
  • Did you forget to convert back to UTC? Or does that happen implicitly in the comparison? – Matt Johnson-Pint Aug 17 '15 at 17:43
  • @Matt: Neither. "Converting to UTC" is an implicit detail of converting to `timestamptz` - which is a UTC timestamp internally and works independent of time zones. Don't be confused by the somewhat misleading data type name: http://stackoverflow.com/a/28876266/939860. Follow the links for details. – Erwin Brandstetter Aug 17 '15 at 18:12
  • Ok, so `date_trunc` still returns a `timestamptz`. It's truncated to the start of the local date of the specified time zone, but it still internally tracks the UTC point in time - correct? Does the truncation properly deal with time zones that skip over midnight on a DST transition day? (Example, `America/Sao_Paulo` started the day at 1:00 AM on Oct 19, 2014.) – Matt Johnson-Pint Aug 17 '15 at 20:53
  • 1
    @MattJohnson: No. `date_trunc()` returns `timestamp` for `timestamp` input and `timestamptz` for `timestamptz`. I added an explanation step-by-step to clarify. – Erwin Brandstetter Aug 17 '15 at 22:32
  • Moving the `AT TIME ZONE` part from the column to the parameter improved the time of my query from 11 seconds to 4 ms(!). Thank you for your answer! – Wim Deblauwe Feb 14 '23 at 06:42