4

If I want to group a column of timestamps, say registered_at by the day on which they occurred, I can use either date_trunc('day', registered_at) or registered_at::date. The first removes the hours and smaller units from the timestamp, but still returns a timestamp, while the latter returns the timestamp cast to a date. Now, I'm wondering if any of the two has a better performance than the other.

When I look at the query plans, the theoretical costs are exactly the same, and there can be a lot of noise in the actual execution times.

-- SELECT date_trunc('day', registered_at)
Seq Scan on customers  (cost=0.00..5406.45 rows=23987 width=8) (actual time=0.023..46.811 rows=24436 loops=1)
  Filter: (created_at > '2019-06-01 00:00:00'::timestamp without time zone)
  Rows Removed by Filter: 113958
Planning time: 0.107 ms
Execution time: 48.158 ms

-- SELECT registered_at::date
Seq Scan on customers  (cost=0.00..5406.45 rows=23987 width=4) (actual time=0.017..34.353 rows=24436 loops=1)
  Filter: (created_at > '2019-06-01 00:00:00'::timestamp without time zone)
  Rows Removed by Filter: 113958
Planning time: 0.121 ms
Execution time: 35.548 ms

Does anyone have an idea which method would be faster, either when truncating or when subsequently using group by?

Ruben Helsloot
  • 12,582
  • 6
  • 26
  • 49
  • Run `explain (analyze, buffers)` and most likely you will see that the second execution simply got more blocks from the cache. –  Jul 03 '19 at 19:19
  • I found `Buffers: shared hit=4910` on both, but if you mean that that would explain the difference in execution times, I'm not looking at those anyways, because they're probably too noisy to determine anything from – Ruben Helsloot Jul 03 '19 at 19:25
  • The number of buffers retrieved is actually a good metric for tuning. The less buffers your query needs the better it will scale. It doesn't really matter if those buffers are retrieved from the cache or not. So do look at those when you evaluate the performance of different solutions. –  Jul 04 '19 at 07:37

1 Answers1

0

If the result varies if you run the test several times, it could be a caching issue.

Also there is always som “random noise” in query execution times.

If you repeat the experiment many times, and the result you get is statistically significantly different for the two cases, and the queries are the same with the exception of a different function that is called for each result row, then the difference has to be the function execution time.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263