4

I am trying to use the TimescaleDB extension to compute some continuous aggregates. I have this query which works fine:

SELECT distinct time_bucket('1 hour', entry_ts) as date_hour,
                type_id,
                entry_id,
                exit_id,
                count(*) OVER (partition by time_bucket('1 hour', entry_ts), entry_id, exit_id, type_id) AS total,
                ((count(*) over (partition by time_bucket('1 hour', entry_ts), entry_id, exit_id, type_id)) * 100)::numeric /
                (count(*) over (partition by time_bucket('1 hour', entry_ts), entry_id)) percentage_of_entry
FROM transits

When I try to put this inside a continuous aggregate materialized view, I get an error:

CREATE MATERIALIZED VIEW transits_hourly
            WITH (timescaledb.continuous) AS
SELECT distinct time_bucket('1 hour', entry_ts) as date_hour,
                type_id,
                entry_id,
                exit_id,
                count(*) OVER (partition by time_bucket('1 hour', entry_ts), entry_id, exit_id, type_id) AS total,
                ((count(*) over (partition by time_bucket('1 hour', entry_ts), entry_id, exit_id, type_id)) * 100)::numeric /
                (count(*) over (partition by time_bucket('1 hour', entry_ts), entry_id)) percentage_of_entry
FROM transits
WITH NO DATA

The error I get is:

ERROR:  invalid continuous aggregate view
SQL state: 0A000

Does TimescaleDB allow Continuous Aggregates over Partition By time windows?

I am using TimescaleDB 2.1 on PostgreSQL 12.5.

jbx
  • 21,365
  • 18
  • 90
  • 144
  • No, neither DISTINCT nor window functions are permitted according to the note section in https://docs.timescale.com/latest/api#continuous_aggregate-create_view – k_rus Mar 23 '21 at 14:24
  • Outside continuous aggregates window functions are supported on hypertables. – k_rus Mar 23 '21 at 18:42
  • @k_rus yes, in fact the query works. What I am looking for however is to pre-calculate these percentages. Is there any other way around this to pre-calculate percentages? – jbx Mar 24 '21 at 07:19
  • I posted an answer to your main question and enumerated few high level possibilities. However, I don't think any of them are good enough to have a nice solution to your issue without much effort. I believe considerable development work is needed to provide a general solution for such kind of use cases. – k_rus Mar 25 '21 at 09:04

1 Answers1

5

TimescaleDB is a PostgreSQL extension and allows the most of PostgreSQL's functionality. There is no known limitations on SELECT statements on hypertables.

However, continuous aggregates support limited queries, so it can maintain materialisation incrementally instead of refreshing entire materialisation, which will be expensive. Basically queries should allow to process each aggregate group independently from other groups, thus DISTINCT and window functions are not allowed.

The documentation of creating a continuous aggregate contains a note subsection with list of limitations on the SELECT statement. In particular:

Aggregates with ORDER BY, DISTINCT and FILTER clauses are not permitted.

Window functions cannot be used in conjunction with continuous aggregates.

Possible approaches to tackle limitations:

  • Create a continuous aggregate with permitted SELECT statement and then define a view on top of it, which will calculate the final result. This can allow to reduce the amount of data to process by the final view, but still can be expensive to execute.
  • Create a materialized view and create automation of refreshing it, e.g., with help of custom jobs. However, the refresh will recalculate entire materialisation.
  • If you have a good idea how calculate the query on a portion of data, you can write an insert script into another table, which will be specially created for storing the materialization. Then materialisation can be automated with, e.g., custom jobs.
k_rus
  • 2,959
  • 1
  • 19
  • 31
  • Let's say I reverted back to a `group by` continuous aggregate, which stores the counts grouped by `entry_id`, `exit_id` and `type_id` for each `time_bucket`. Would it be possible to somehow create another continuous aggregate on top of it that calculates the percentage from summing the counts of the same `entry_id` within the same time bucket? – jbx Mar 25 '21 at 13:57
  • There is not support to create a continuous aggregate on top of another continuous aggregate in TimescaleDB. Only one and only hypertable can be in the query for a continuous aggregate. – k_rus Mar 25 '21 at 19:39