3

I have a table which has an id column and is specified as follows:

CREATE TABLE foo
(
    id integer NOT NULL GENERATED ALWAYS AS IDENTITY ( INCREMENT 1 START 1 MINVALUE 1 MAXVALUE 2147483647 CACHE 1 ),
    name text,
    description text,
    CONSTRAINT foo_pkey PRIMARY KEY (id)
)

I am updating this table quite frequently (every 5min) but I am using 'ON CONFLICT DO NOTHING'

As described in this post serial in postgres is being increased even though I added on conflict do nothing it is expected that the id increments using this setup.

However based on this post What happens to the primary key Id when it goes over the limit? I do need to worry about the counter going over the limit.

How do I prevent an error from occurring in this scenario?
To be clear, I don't need the values to be gap-less nor do I necessarily need to have the values be in sequence, I just need uniqueness and I would like to be able to use as small a datatype as possible (I would like to switch to smallint at some point)

Possible solutions I can see:

  1. Manually ensuring no duplicates would be inserted before perfoming the actual insert as is done in this solution: POSTGRES - prevent serial incrementation with ON CONFLICT DO NOTHING. Downside: Computationally am I not checking the uniqueness constraint twice per insert now?
  2. Somehow have postgres use gap-less sequential values. Downside: Seems like this is not the intention of how postgres should work
  3. Have postgres re-start the counter when it loops around rather than throwing an error: https://www.postgresql.org/docs/9.5/sql-createsequence.html Says there is a cycle option which sounds like exactly what I need.
  4. Just go with it and accept that I need to use bigint

Update

so far:

  1. Feasible option but as mentioned there is a cpu resource penalty
  2. Bad idea based on this post PostgreSQL gapless sequences
  3. Not possible since this would lead to duplicate id's
  4. Best option in most cases

For my particular case I would like to further determine whether option 1 might not be better than option 4 after all:

  • Table with id's has about 20k rows
  • A batch insert with nearly all 20k duplicate items is made every 5 min
  • The table which references the one with the id's contains 10bn+ rows and runs on timescaledb
  • If we use bigint(8byte) instead of int(4byte) we use (10bn*(8-4)byte) ca. 40GB uncompressed additional space
  • I guess with timescaledb this would be somewhat compressed down to maybe 10GB additional space used for the id's

Is it still better to go for option 4 over option 1 in this case, given the infrequent updates made to the table and the space usage for storing id's in this case?

sev
  • 1,500
  • 17
  • 45

1 Answers1

1

I think that the correct solution is another one: change the data type of the identity column to bigint. Then you won't run out of sequence values.

Performing that change with a simple ALTER TABLE will rewrite it and lock it for the duration of the statement. There are more complicated ways to do it with less down tiime, see for example this answer.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • Thank you for your answer. The reason I would prefer not to use this method is that I need to reference this id in a very large second table. As it has very many rows, the space savings from using `smallint` vs `bigint` start to become relevant. What is the issue with using the `cycle` option? – sev Jun 02 '21 at 16:05
  • 2
    The cycle option repeats values. Guaranteeing either duplicate ids or you delete data before it cycles. Storage space is cheep compared to the alternatives. – Belayer Jun 02 '21 at 17:48
  • @Belayer Good point. To give some quick reference. The table I am inserting to has about 20k rows, the table which references it has about 10Bn Rows. So uncompressed Bigint would use about 80GB and smalling 20GB - Difference of 60GB. As that table uses timescaledb I would guess that compressed that 60GB reduces to a difference of maybe 10GB. Is 10GB worth less than computational cost of checking uniqueness in a batch insert of 10k values to a 20k row table every 5min? (obv. just looking for an approximate answer, not math) – sev Jun 02 '21 at 18:38
  • 1
    Perhaps you should update the question with that information. – Belayer Jun 02 '21 at 18:55