5

I am using the latest docker version of Postgres 14.3 and Timescale 2.7.0.

I am running some benchmarks to make sure timescaledb is the right solution for my client. I have a hypertable with 50 million rows. These were inserted in (approximately) chronological order (the approximately comes from the fact that 4 parallel processes insert the rows, but they move hour by hour pretty much in sync).

I also have a continuous aggregate called daily_view on the time (aggregated by day), and some categorisation criteria, mainly customer ID and type. There are a total of 100,000 unique customer IDs, which according to this post should not be an issue because TimescaleDB handles high cardinality (or so it is claimed).

A simple query like:

select * from daily_vew limit 1;
...
Time: 39429.423 ms (00:39.429)

Takes over 39seconds!

Doing a select count(*) from daily_view, took 1minute 43 seconds.

Strangely enough, when I dropped the materialized view of the continuous aggregate, and recreated it on the same exact hypertable of 50 million rows. The same exact query:

select * from daily_vew limit 1;
...
Time: 15.829 ms

Took just 15ms!

A select count(*) took 9 seconds.

Obviously, the continouous aggregate is no use if it can't be created beforehand and updated as data comes in.

Why is the performance so abysmal on the continuous aggregate? Why does it perform orders of magnitude faster when it is recreated from scratch?

jbx
  • 21,365
  • 18
  • 90
  • 144

1 Answers1

1

Continuous aggregates are also hypertables. You can probably use the approximate_row_count that should be faster than the original count function.

jonatasdp
  • 1,072
  • 6
  • 8
  • 1
    The issue is not with the count, I don't need that per se. Other queries, even simply selecting the first row with `limit 1`, are very slow. My worry is that the continuous aggregate is very slow in general when cardinality starts to get high. I also can't understand why removing the continuous aggregate and recreating it again on exactly the same data, suddenly speeds things up immensely. – jbx Jun 11 '22 at 21:54
  • Thanks for sharing the details. If you can share a reproducible example, I'd encourage you to share it as an issue on GitHub to be prioritized as a bug fix. Another thing I'd investigate would be the size of the hypertable time interval, so if it's too small, the hypertable will break down into too many chunks and need more scans to analyze. – jonatasdp Jun 13 '22 at 13:23
  • What do you mean by hypertable time interval? My hypertable just has one timestamp field which has an index. The materialized view then time buckets them by day, as I said above. If you are referring to the `chunk_time_interval` of the hypertable of the raw data, then it is left default (which I think is 7 days). – jbx Jun 14 '22 at 21:19
  • Yes, I mean `chunk_time_interval`. – jonatasdp Jun 19 '22 at 11:19