I have a table which has an id column and is specified as follows:
CREATE TABLE foo
(
id integer NOT NULL GENERATED ALWAYS AS IDENTITY ( INCREMENT 1 START 1 MINVALUE 1 MAXVALUE 2147483647 CACHE 1 ),
name text,
description text,
CONSTRAINT foo_pkey PRIMARY KEY (id)
)
I am updating this table quite frequently (every 5min) but I am using 'ON CONFLICT DO NOTHING'
As described in this post serial in postgres is being increased even though I added on conflict do nothing it is expected that the id increments using this setup.
However based on this post What happens to the primary key Id when it goes over the limit? I do need to worry about the counter going over the limit.
How do I prevent an error from occurring in this scenario?
To be clear, I don't need the values to be gap-less nor do I necessarily need to have the values be in sequence, I just need uniqueness and I would like to be able to use as small a datatype as possible (I would like to switch to smallint at some point)
Possible solutions I can see:
- Manually ensuring no duplicates would be inserted before perfoming the actual insert as is done in this solution: POSTGRES - prevent serial incrementation with ON CONFLICT DO NOTHING. Downside: Computationally am I not checking the uniqueness constraint twice per insert now?
- Somehow have postgres use gap-less sequential values. Downside: Seems like this is not the intention of how postgres should work
- Have postgres re-start the counter when it loops around rather than throwing an error: https://www.postgresql.org/docs/9.5/sql-createsequence.html Says there is a cycle option which sounds like exactly what I need.
- Just go with it and accept that I need to use bigint
Update
so far:
- Feasible option but as mentioned there is a cpu resource penalty
- Bad idea based on this post PostgreSQL gapless sequences
- Not possible since this would lead to duplicate id's
- Best option in most cases
For my particular case I would like to further determine whether option 1 might not be better than option 4 after all:
- Table with id's has about 20k rows
- A batch insert with nearly all 20k duplicate items is made every 5 min
- The table which references the one with the id's contains 10bn+ rows and runs on timescaledb
- If we use bigint(8byte) instead of int(4byte) we use (10bn*(8-4)byte) ca. 40GB uncompressed additional space
- I guess with timescaledb this would be somewhat compressed down to maybe 10GB additional space used for the id's
Is it still better to go for option 4 over option 1 in this case, given the infrequent updates made to the table and the space usage for storing id's in this case?