77

Postgres can round (truncate) timestamps using the date_trunc function, like this:

date_trunc('hour', val)
date_trunc('minute', val)

I'm looking for a way to truncate a timestamp to the nearest 5-minute boundary so, for example, 14:26:57 becomes 14:25:00. The straightforward way to do it is like this:

date_trunc('hour', val) + date_part('minute', val)::int / 5 * interval '5 min'

Since this is a performance-critical part of the query, I'm wondering whether this is the fastest solution, or whether there's some shortcut (compatible with Postgres 8.1+) that I've overlooked.

DNS
  • 37,249
  • 18
  • 95
  • 132
  • Why not turn that into a function and then index on it and see how fast it is then? Or just index right on that exact same logic without wrapping it in an index. either way you'll get an idea how fast it can be. – Scott Marlowe Sep 04 '11 at 23:29
  • See http://stackoverflow.com/a/8963684/287948 – Peter Krauss Dec 03 '15 at 08:34
  • 2
    A note, this method does not seem to round up to the "nearest" boundary, but rounds _down_ to the next _lowest_ boundary. i.e. '2017-04-01 00:04:00' appears to round to '2017-04-01 00:00:00', not _up_ to the _nearest_ boundary, which would be '2017-04-01 00:05:00' – DaveRGP Jul 18 '17 at 12:17
  • 1
    This operation soon will be much simpler with [date_bin](https://stackoverflow.com/a/66907341/5070879) function – Lukasz Szozda Apr 01 '21 at 15:27

4 Answers4

31

I was wondering the same thing. I found two alternative ways of doing this, but the one you suggested was faster.

I informally benchmarked against one of our larger tables. I limited the query to the first 4 million rows. I alternated between the two queries in order to avoid giving one a unfair advantage due to db caching.


Going through epoch/unix time

SELECT to_timestamp(
    floor(EXTRACT(epoch FROM ht.time) / EXTRACT(epoch FROM interval '5 min'))
    * EXTRACT(epoch FROM interval '5 min')
) FROM huge_table AS ht LIMIT 4000000

(Note this produces timestamptzeven if you used a time zone unaware datatype)

Results

  • Run 1: 39.368 seconds
  • Run 3: 39.526 seconds
  • Run 5: 39.883 seconds

Using date_trunc and date_part

SELECT 
    date_trunc('hour', ht.time) 
    + date_part('minute', ht.time)::int / 5 * interval '5 min'
FROM huge_table AS ht LIMIT 4000000

Results

  • Run 2: 34.189 seconds
  • Run 4: 37.028 seconds
  • Run 6: 32.397 seconds

System

  • DB version: PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit
  • Cores: Intel® Xeon®, E5-1650v2, Hexa-Core
  • RAM: 64 GB, DDR3 ECC RAM

Conclusion

Your version seems to be faster. But not fast enough for my specific use case. The advantage of not having to specify the hour makes the epoch version more versatile and produces simpler parameterization in client side code. It handles 2 hour intervals just as well as 5 minute intervals without having to bump the date_trunc time unit argument up. On a end note, I wish this time unit argument was changed to a time interval argument instead.

André C. Andersen
  • 8,955
  • 3
  • 53
  • 79
  • 1
    The first version should use `floor` instead of casting to int - since the casting may round up leading to wrong results (`select 0.9::int -- =1`) – Pyrocks Oct 03 '19 at 08:02
  • 1
    @Pyrocks Thank you so much. I've worked with postgres for years, and now is the first time I learned that casting to int "rounds" instead of "truncates" decimals. All programming languages I've come across truncates, so I just assumed it applied to postgres (and other databases?). I've updated the answer, but I don't have time to rerun the query. I hope it doesn't affect it too much. – André C. Andersen Oct 05 '19 at 20:55
  • I would add function trunc_minute for convenience based on this answer: create or replace function trunc_minute(ts timestamptz, n int) returns timestamptz language sql as $sql$ select date_trunc('hour',ts)+trunc(date_part('minute',ts)/n)*n*interval '1 minute'$sql$; – okharch Feb 07 '23 at 08:57
23

I don't think there is any quicker method.

And I don't think you should be worried about the performance of the expression.

Everything else that is involved in executing your (SELECT, UPDATE, ...) statement is most probably a lot more expensive (e.g. the I/O to retrieve rows) than that date/time calculation.

8

Since Postgres 14, date_bin() is simplest and fastest:

date_bin('5 min', val, '2000-1-1')

The manual:

The function date_bin “bins” the input timestamp into the specified interval (the stride) aligned with a specified origin.

date_bin(stride, source, origin)

source is a value expression of type timestamp or timestamp with time zone. (Values of type date are cast automatically to timestamp.) stride is a value expression of type interval. The return value is likewise of type timestamp or timestamp with time zone, and it marks the beginning of the bin into which the source is placed.

Provide an "origin" of matching data type to avoid unexpected results from the cast ignoring time zones or assuming the wrong one.

My example looks like a date literal but serves as valid timestamp literal, too. If the time component is missing '00:00' is assumed.

Related:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
3

Full query for those wondering (based on @DNS question):

Assuming you have orders and you want to count them by slices of 5min and shop_id:

SELECT date_trunc('hour', created_at) + date_part('minute', created_at)::int / 5 * interval '5 min' AS minute
      , shop_id, count(id) as orders_count
FROM orders
GROUP BY 1, shop_id
ORDER BY 1 ASC
Benjamin Crouzier
  • 40,265
  • 44
  • 171
  • 236