I am using the latest docker version of Postgres 14.3 and Timescale 2.7.0.
I am running some benchmarks to make sure timescaledb is the right solution for my client. I have a hypertable with 50 million rows. These were inserted in (approximately) chronological order (the approximately comes from the fact that 4 parallel processes insert the rows, but they move hour by hour pretty much in sync).
I also have a continuous aggregate called daily_view
on the time (aggregated by day), and some categorisation criteria, mainly customer ID and type. There are a total of 100,000 unique customer IDs, which according to this post should not be an issue because TimescaleDB handles high cardinality (or so it is claimed).
A simple query like:
select * from daily_vew limit 1;
...
Time: 39429.423 ms (00:39.429)
Takes over 39seconds!
Doing a select count(*) from daily_view
, took 1minute 43 seconds.
Strangely enough, when I dropped the materialized view of the continuous aggregate, and recreated it on the same exact hypertable of 50 million rows. The same exact query:
select * from daily_vew limit 1;
...
Time: 15.829 ms
Took just 15ms!
A select count(*)
took 9 seconds.
Obviously, the continouous aggregate is no use if it can't be created beforehand and updated as data comes in.
Why is the performance so abysmal on the continuous aggregate? Why does it perform orders of magnitude faster when it is recreated from scratch?