0

I have a table data that looks something like this (simplified for the post):

|insert_ts          | actual_ts           | group_id| amount|
|2022-02-22 00:01:01| 2022-02-06 05:05:01 | 1       |    100|

And a hypertable created as follows:

CREATE MATERIALIZED VIEW hourly_view
    WITH (timescaledb.continuous) AS
SELECT time_bucket('1 hour', actual_ts) as date_hour,
       group_id,
       count(*)                           total,
       avg(amount)                        avg_amount
FROM data
GROUP BY date_hour,
         group_id
WITH NO DATA;

With a continuous aggregate policy that looks like this:

SELECT add_continuous_aggregate_policy('hourly_view',
                                       start_offset => INTERVAL '1 month',
                                       end_offset => INTERVAL '1 hour',
                                       schedule_interval => INTERVAL '1 hour');

This seems to work normally with data that is inserted in sequence close to the current timestamp. However, when inserting backdated data that is still within the 1 month range, the data did not appear in the view.

In my case, the insert_ts timestamp is telling me when the record was actually inserted. I had records that were inserted between 2022-02-16 and 2022-02-23 but their actual_ts (on which I am bucketing the data) was between 2022-02-02 and 2022-02-06.

I was expecting that the continuous aggregate policy would have everything in sync by the 2022-02-23, since it should be refreshing 1 month of data every hour. But the data was missing from the aggregate.

I checked the postgresql logs for the 23rd of February and there were the following:

2022-02-23 22:01:58.526 UTC [31958] LOG:  refreshing continuous aggregate "hourly_view" in window [ 2022-01-23 23:00:00+00, 2022-02-23 21:00:00+00 ]
2022-02-23 22:01:58.542 UTC [31958] LOG:  job 1002 threw an error
2022-02-23 22:01:58.543 UTC [31958] ERROR:  cannot execute SQL without an outer snapshot or portal
2022-02-23 22:01:58.543 UTC [31958] CONTEXT:  SQL statement "DELETE FROM _timescaledb_internal._materialized_hypertable_4 AS D WHERE D.date_hour >= '2022-01-28 18:00:00+00' AND D.date_hour < '2022-02-23 21:00:00+00' ;"

When I ran the following command I the data was refreshed successfully:

CALL refresh_continuous_aggregate('vehicle_transit_hourly_view', NULL, NULL);

I am currently running Postgres 13 with TimescaleDB 2.1.

jbx
  • 21,365
  • 18
  • 90
  • 144

1 Answers1

0

This is a bug that should be fixed, depending on what version you are running. There are a couple of related issues. If you are already on release 2.4.1 of TimescaleDB or above, some users have found that the problem persisted because they hadn't run ALTER EXTENSION after upgrading.

If this doesn't apply then this may be a different issue and I think your best route might be to raise a new GitHub issue with (if possible) a reproducible example.

Here are the related issue links:

Transparency: I work for Timescale

greenweeds
  • 524
  • 2
  • 13