4

I have a PostgreSQL table which I am trying to convert to a TimescaleDB hypertable.

The table looks as follows:

CREATE TABLE public.data
(
    event_time timestamp with time zone NOT NULL,
    pair_id integer NOT NULL,
    entry_id bigint NOT NULL,
    event_data int NOT NULL,
    CONSTRAINT con1 UNIQUE (pair_id, entry_id ),
    CONSTRAINT pair_id_fkey FOREIGN KEY (pair_id)
        REFERENCES public.pairs (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
)

When I attempt to convert this table to a TimescaleDB hypertable using the following command:

SELECT create_hypertable(
        'data',
        'event_time',
        chunk_time_interval => INTERVAL '1 hour',
        migrate_data => TRUE
);

I get the Error: ERROR: cannot create a unique index without the column "event_time" (used in partitioning)

Question 1: From this post How to convert a simple postgresql table to hypertable or timescale db table using created_at for indexing my understanding is that this is because I have specified a unique constraint (pair_id_fkey) which does not contain the column I am partitioning by - event_time. Is that correct?

Question 2: How should I change my table or hypertable to be able to convert this? I have added some data on how I plan to use the data and the structure of the data bellow.

Data Properties and usage:

  • There can be multiple entries with the same event_time - those entries would have entry_id's which are in sequence
    • This means that if I have 2 entries (event_time 2021-05-18::10:16, id 105, <some_data>) and (event_time 2021-05-18::10:16, id 107, <some_data>) then the entry with id 106 would also have event_time 2021-05-18::10:16
  • The entry_id is not generated by me and I use the unique constraint con1 to ensure that I am not inserting duplicate data
  • I will query the data mainly on event_time e.g. to create plots and perform other analysis
  • At this point the database contains around 4.6 Billion rows but should contain many more soon
  • I would like to take advantage of TimescaleDB's speed and good compression
  • I don't care too much about insert performance

Solutions I have been considering:

  • Pack all the events which have the same timestamp in to an array somehow and keep them in one row. I think this would have downsides on compression and provide less flexibility on querying the data. Also I would probably end up having to unpack the data on each query.
  • Remove the unique constraint con1 - then how do I ensure that I don't add the same row twice?
  • Expand unique constraint con1 to include event_time - would that not somehow decrease performance while at the same time open up for the error where I accidentally insert 2 rows with entry_id and pair_id but different event_time? (I doubt this is a likely thing to happen though)
sev
  • 1,500
  • 17
  • 45

1 Answers1

3

You understand correctly that UNIQUE (pair_id, entry_id ) doesn't allow to create hypertable from the table, since unique constraints need to include the partition key, i.e., event_time in your case.

  1. I don't follow how the first option, where records with the same timestamp are packed into single record, will help with the uniqueness.

  2. Removing the unique constraint will allow to create hypertable and as you mentioned you will lose possibility to check the constraint.

  3. Adding the time column, e.g., UNIQUE (pair_id, entry_id, event_time) is quite common approach, but it allows to insert duplicates with different timestamps as you mentioned. It will perform worse than option 2 during inserts. You can replace index on event_time (which you need, since you query on this column, and it is created automatically by TimescaleDB) with unique index, so you save a little bit e.g.,

CREATE UNIQUE INDEX indx ON (event_time, pair_id, entry_id);
  1. Manually create unique constraint on each chunk table. This will guarantee uniqueness within the chunk, but it will be still possible to have duplicates in different chunks. The main drawback is you will need to figure out how to create it when new chunk is created.

Unique constraints without partition keys are not supported in TimescaleDB, since it will require to access all existing chunks to check uniqueness and it will kill performance. (or it will require to create a global index, which can be large) I don't think it is common case for time series data to have unique constraints as it is usually related to artificially generated counter-based identifiers.

k_rus
  • 2,959
  • 1
  • 19
  • 31
  • From your answer I am leaning towards solution 3. as I don't care too much about insert performance and I think entry_id identifies event_time uniquely. I don't quite understand your last sentence though. I don't have an index on event_time? What would I save? In the end I think I only need an index on event_time and pair_id since I will be looking up based on those 2 values. – sev May 18 '21 at 14:17
  • @sev When you create the hypertable, TimescaleDB will create the index on `event_time` automatically. Sorry for not being clear. – k_rus May 18 '21 at 14:33
  • Ah ok, so in addition to the unique constraint on (event_time, pair_id, entry_id) I would create an index on (event_time, pair_id, entry_id) in postgres before migrating to timescaledb and then say don't create default index? – sev May 18 '21 at 14:38
  • I updated the option 3 with the information about the default index. – k_rus May 18 '21 at 14:42
  • 1
    @sev It's enough to create unique index. See the syntax in my answer. – k_rus May 18 '21 at 14:43