-1

BACKGROUND/CONTEXT

If I define a sequence and a table as follows:

CREATE SEQUENCE public.test_sequence
    CYCLE
    MINVALUE 1
    MAXVALUE 2147483647;

CREATE TABLE public.test_table
(
    id integer NOT NULL DEFAULT nextval('test_sequence'::regclass),
    name text,
    CONSTRAINT test_table_pkey PRIMARY KEY (id)
);

And insert a ton of values. Eventually I will reach the end of the sequence and because of the CYCLE keyword nextval will go back to the beginning of the sequence and once again return 1, 2, etc. Unfortunately, if I am inserting and the old record still exists with the same id, then I end up with a collision and the insert call will result in a "duplicate key violates unique constraint" error. I was somewhat surprised with this result in that I made the obviously incorrect assumption in believing that "nextval" was slightly more intelligent and meant the next free id value which would obviously be more useful in this context.

According to the following thread it appears that I am perhaps not alone in making this assumption and that others may be running into this same issue: How do I avoid nextval() colliding with existing entries on wrapping?

There are obviously a few possible workarounds to this problem:

  1. Migrate to a bigger data type such as bigint.
  2. Migrate to using UUID values instead.
  3. Introduce an ON CONFLICT DO NOTHING clause to the INSERTs.
  4. Introduce an ON CONFLICT DO UPDATE clause to the INSERTs.
  5. Add a BEFORE INSERT trigger to increment the nextval of the sequence until we reach a free one.
  6. etc.?

For those who are interested, the following article explores the differences between a few of the above options: Choosing a Postgres Primary Key

For a variety of reasons, and for better or worse, I chose option 5 to solve this issue. I would appreciate if we could avoid debating which is the better solution here. Especially considering the best choice can depend on a person's particular requirements. I would however be interested if there are other unique workarounds that I did not consider above?

To further clarify our requirements. We are running on an embedded device and space is finite. We have recently run out of space and we have barely started populating our major tables. Consequently, any option that increases our space usage is not necessarily our best option. We do not currently have any limits on our time so we can currently afford a performance penalty here. Consequently, for our needs we have already chosen option 5. For other people's requirements, any of the other above options may indeed be a better choice.

QUESTION

Considering that we have already chosen option 5. What is the best way to implement this?

I am interested in getting feedback on the database triggers I wrote to handle this situation. Specifically:

  • If anybody notices a concurrency problem with the locking solution I went with?

  • If there are better ways to write this trigger?

  • etc.

  • What are some of the gotchas/footguns of this approach?

    A1 - @Ry- makes a good point that the key reuse can lead to some race conditions (if I'm paraphrasing correctly here). One particular scenario as an example is when you have 3 transactions: the first does a DELETE on id 5, the second later does an INSERT on id 5, the third is trying to do an UPDATE on id 5. Depending on when the third transaction arrives (before or after the other two), it will either update the old record or the new, one of which might not have been the intended update. Hence the race and possible data corruption. I suppose this particular case could possibly be mitigated by always doing a "SELECT FOR UPDATE" to check the previous state before doing the "UPDATE" (within the same transaction). However, this would obviously incur further performance overhead and risks users forgetting to do so.

A2 - @Atmo makes a good point that there will be a performance penalty for this function depending how full (or sparse), the values are after you wrap around. This culminates in a worst case scenario if there are no more free ids where you would deadlock if you are locking or infinite loop (at least until someone else deletes a record) if you are not locking. It should be noted that this time penalty is linear time as it only needs to check each record once and will only be hitting the index and not the records. Mitigation techniques for this scenario include regular cleanup (if possible) to maintain fewer entries in the table and checking for wrap around in the trigger to avoid deadlock/looping.

A3 - Cycling/wrapping the id values means that you can no longer rely on the id values for chronological ordering/sorting (if you happen to be doing so).

And here is the associated trigger code:

CREATE OR REPLACE FUNCTION fix_id_collision()
RETURNS TRIGGER AS $$
DECLARE
  found_id integer;
  column_default text;
BEGIN
  -- Loop until we find a free id value.
  LOOP
    -- Check if the id already exists in the table.
    -- Use row level "FOR UPDATE" locking to hopefully ensure
    -- that concurrent INSERT queries don't receive the same id
    -- and collide.
    EXECUTE format('SELECT id FROM %I.%I WHERE id = %L FOR UPDATE', TG_TABLE_SCHEMA, TG_TABLE_NAME, NEW.id) INTO found_id;
    IF found_id IS NULL THEN
      RETURN NEW;
    END IF;
    EXECUTE format('SELECT column_default FROM
information_schema.columns WHERE table_schema=%L AND table_name=%L', TG_TABLE_SCHEMA, TG_TABLE_NAME) INTO column_default;
    EXECUTE format('SELECT %s', column_default) INTO NEW.id;
  END LOOP;
END;
$$
LANGUAGE plpgsql;

And some code to install it for all the tables in my database:

CREATE OR REPLACE FUNCTION install_id_collision_triggers()
RETURNS VOID
AS $$
DECLARE
  tbl_schema text;
  tbl_name text;
BEGIN
  FOR tbl_schema, tbl_name IN SELECT table_schema, table_name FROM information_schema.columns WHERE column_name='id'
  LOOP
    EXECUTE format('DROP TRIGGER IF EXISTS id_collision_trigger_%s ON %I.%I', tbl_name, tbl_schema, tbl_name);
    EXECUTE format('CREATE TRIGGER id_collision_trigger_%s BEFORE INSERT ON %I.%I FOR EACH ROW EXECUTE FUNCTION fix_id_collision()', tbl_name, tbl_schema, tbl_name);
  END LOOP;
END;
$$
LANGUAGE plpgsql;

SELECT install_id_collision_triggers();

P.S. I am already aware that I can do a "CREATE OR REPLACE TRIGGER" in recent PostgreSQL versions but I happen to be using an older version for testing so please excuse the extra step.

FOLLOW UP

@klin makes an interesting point here that there might not be a space difference between using the integer and bigint/bigserial data types (on 64-bit systems) due to data type alignment issues as described in his following post: Benchmark: bigint vs int on PostgreSQL . I have run a very very basic test and confirm that there is only a very marginal increase in space usage between choosing integer vs bigint. It worked out to approximately 4% more space usage for bigint in my case. Consequently, there were not as great space savings as I originally thought in choosing this particular approach and I can recommend against using this workaround. Especially in consideration of the potential for data corruption that it introduces.

Chris Malek
  • 103
  • 9
  • Not an answer to your question but I happened to have written a [short explanation](https://stackoverflow.com/questions/75265400/whenever-my-pc-crashes-my-sql-db-adds-1000-to-the-next-id/75265556#75265556) about why a sequence works the way it does including, amongst other things, why it does not check for existing values in tables. – Atmo Apr 05 '23 at 15:18
  • 2
    *Migrate to a bigger data type such as bigint and hope you never reach the end.* A bigserial PK table fed with a million rows per second will cycle after 292471 years. – klin Apr 05 '23 at 15:38
  • Thanks @Atmo. That is a good explanation. It is unfortunate that their requirements are in conflict with usage of the CYCLE keyword. Making the sequence requirements compatible with the CYCLE requirements would obviously likely come at the cost of some reduced performance. It is maybe however a bit sad that instead of these two concepts "just working" together that they blow up in users' faces. I think some users might be willing to sacrifice that additional performance for something that simply works out of the box. Would be nice to have the option either way. – Chris Malek Apr 05 '23 at 15:48
  • Whose requirements are in conflict? Anyway, honestly, if you insert so many records that you make a `bigserial` sequence cycle, this additional performance as you call it becomes a predominant functionality in and of itself. See my answer below. – Atmo Apr 05 '23 at 15:56
  • Moreover, I think you underestimate the cost of checking existing values. Sure, you could afford checking a few values but what if the gap between the current value and the next is 1 trillion? or 1 quadrillion? That means to insert 1 record, you expect the database to spend **several years** looking for the next available value. Assuming the table is not loaded in memory, it will be from the disk, which is several times worse... No user will want that. – Atmo Apr 05 '23 at 16:03
  • @Atmo The conflict is when the CYCLE wraps and starts hitting values from the previous sequence. CYCLE's requirement is to wrap the values of the sequence so we can keep inserting values and the sequence's requirement is to return values quickly. – Chris Malek Apr 05 '23 at 16:15
  • As I and @klin said, you're simply never going to reach the point where a `bigserial` sequence needs to cycle so that should not be something for you to consider. There is simply no hardware, worldwide, powerful enough that it had to cycle such sequence yet. We are talking about a billion records per second, 24/7 for almost 300 years. Are you saying your use case will breach these figures? For the record, my computer does not reach 1/10000th of that. – Atmo Apr 05 '23 at 16:20
  • This is a bad idea for all the reasons that have already been mentioned and more. It sets up silent data corruption footguns when reuse is close to deletion. – Ry- Apr 05 '23 at 20:43
  • You don't want to debate solutions, but you would like to know if there are other workarounds? Huh? That sounds self-contradictory. There is no real question in this, except "please review my code". Even if you don't want to hear that, the only solution that will work well is to use `bigint`. – Laurenz Albe Apr 06 '23 at 06:25
  • I’m voting to close this question because it is not really asking a clear question. It explicitly does not want to discuss solutions, only asks for code review of an inferior solution. – Laurenz Albe Apr 06 '23 at 06:27
  • @Ry- If you could please elaborate on what data corruption "footguns" this creates, so I could avoid them? – Chris Malek Apr 06 '23 at 10:45
  • 1
    You can’t avoid the footgun, that’s the point. When you delete a row around the same time as you reuse its id (which *can* happen – otherwise you wouldn’t have the problem of collisions in the first place), there’s the potential for some concurrent query that intended to operate on what was deleted to operate on the new row instead. You can avoid a *bug* by having everyone who runs queries keep this issue in mind at all times, but it’s hard and the consequences are bad (very rare and nondeterministic silent data corruption), and that’s what makes it a footgun. – Ry- Apr 06 '23 at 17:14
  • @Ry- If I understand correctly, is the footgun you are talking about due to the reuse of primary key/id values (caused by the use of the CYCLE keyword)? Essentially, that if somebody persists/stores an id/primary key value it is almost impossible to distinguish if it was intended to act on the entry from before or after the cycle wrap around? – Chris Malek Apr 06 '23 at 22:43
  • @ChrisMalek: Yes, that. And not only if explicitly persisting, but just with concurrency in general. Or some query that comes in unexpectedly late because of a chain of delays… – Ry- Apr 06 '23 at 23:59
  • @Ry- I believe your data integrity argument represents the most compelling reason to abandon my choice of workaround. Speed or space requirements don't matter if the choice of workaround is going to lead users or developers into data corruption. If you would like to make your comment an official answer, I would be more than happy to choose it as the accepted answer. You are welcome to steal my example from the question if you feel it correctly explains the problem. (and then I can delete it from the question section). – Chris Malek Apr 08 '23 at 10:36

1 Answers1

3

Personally, I have always and will always solve this problem using bigint/bigserial.

According to the documentation, bigserial gives you 9,223,372,036,854,775,807 values. Sure, it may wrap at some point but let us be realistic for a minute.

Let us go big imagine your database reserves 1M values per second. It will take you a little bit more than 292,271 years to use them all. Even if you somehow managed to use 1 billion values per second, it still leaves more time than enough to turn it into your great-great-...-great-grandchilren's problem, not yours.

I do not see any way a database could support saving 1 billion records per second for almost 300 years. The only possible way you could reserve so many values on the sequence would be by discarding over 99% of them, in which case I would say your design has big issues.

IMHO, you can safely assume a sequence over bigint will never wrap back (as long as you do not discard an insane number of generated values).


Edit:

As I feel, from the comments left under the question and this answer, I am not getting through, let me comment the 5 solutions presented:

  1. Contrary to what is mentioned in the question and as I explain above, a BIGINT sequence requires so many records to be inserted before cycling that it will realisticly never happen.

  2. UUIDs could be a solution to have unique ids without having to care for cycling. It is the solution of choice to have a sharded database (if you show your primary key to users).
    However, data security is off-topic in a SQL question about avoiding primary key conflicts, in the sense that even if it was generating more collisions than other solutions, it would still need to be chosen, and in the sense a table created with a UUID will "never" have PK conflicts (actual probabilities have been answered there), hence does not need to be improved.
    Limiting the answer to only conflict resolution and setting aside the security, solution 1 will not cycle either, is more simple and is faster.
    ->UUIDs should be ruled out (except for security reasons, off-topic here).

  3. The only purpose of ON CONFLICT DO NOTHING is to avoid receiving an error message. However, error messages, outside of getting syntax error when manually typing a query, is only a way for the database to inform the client application and should be considered a good thing. Rather than testing the number of records that were inserted, simply catch and interpret the error message number according to the table provided in documentation.
    -> To be ruled out.

  4. ON CONFLICT DO UPDATE: Same as the above remark + it will destroy existing data that should perhaps be left untouched.
    -> To be ruled out.

  5. The TRIGGER solution can be easily tested. The below script simulates a sequence conflict on table containing 10k values.

    CREATE TABLE A (
        ID BIGSERIAL PRIMARY KEY
    );
    INSERT INTO A SELECT generate_series(1,10000);
    /* The sequence cycling back to 1 is simulated here: netval() will return 1 but the table is already filled */
    CREATE OR REPLACE FUNCTION fix_id_collision()
    RETURNS TRIGGER AS $$
    DECLARE
      found_id integer;
      column_default text;
    BEGIN
      -- Loop until we find a free id value.
      LOOP
      -- Check if the id already exists in the table.
      -- Use row level "FOR UPDATE" locking to hopefully ensure
      -- that concurrent INSERT queries don't receive the same id
      -- and collide.
      EXECUTE format('SELECT id FROM %I.%I WHERE id = %L FOR UPDATE', TG_TABLE_SCHEMA, TG_TABLE_NAME, NEW.id) INTO found_id;
      IF found_id IS NULL THEN
        RETURN NEW;
      END IF;
      RAISE NOTICE 'Id taken %', NEW.id;
      EXECUTE format('SELECT column_default FROM
    information_schema.columns WHERE table_schema=%L AND table_name=%L', TG_TABLE_SCHEMA, TG_TABLE_NAME) INTO column_default;
      EXECUTE format('SELECT %s', column_default) INTO NEW.id;
      RAISE NOTICE 'Trying with id %', NEW.id;
    END LOOP;
    END;
    $$
    LANGUAGE plpgsql;
    CREATE TRIGGER id_collision_trigger_A BEFORE INSERT ON A FOR EACH ROW EXECUTE FUNCTION fix_id_collision();
    

    I added some RAISE NOTICE that can prove concurrent queries work.
    First comment: I do not see point locking the records since nothing is going to happen to them. If anything, doing that would prevent the locked records from being updated/deleted concurrently, while being useless for the other clients to insert new records.
    Secondly, a simple INSERT INTO A VALUES(default) with, I repeat, only 10k values in the table (and 10k conflicts to solve), takes about 30 seconds to complete. On that test sample, the next queries will be faster since there will be no more conflicts to resolve but in that case, a sequence without trigger will work just as well too.
    Outside of the test case I built above, in a situation where you expect conflicts to arise often, the performance is abysmal and I cannot imagine what would happen with more records in the table or a longer series of consecutive values in the primary key.
    -> To be ruled out.

At the risk of repeating my point: solution 1 and solution 2 are the only valid ones. Since solution 1 also happens to be the most simple and the fastest and limiting my answer to id collision (not security), that is definitely the correct way to go.

Solution 5:

  • albeit eventually solving conflicts too,
  • uses orders of magnitude more resources, in the form of processor time and reading from the disk, than solution 1, making it orders of magnitude slower, to the point it feels queries simply do not work.
  • is useless for anything other than a sequence cycling:
    • For UUIDs, even if you are unlucky enough to generate a collision, you are better off letting 1 insertion out of trillions fail rather than executing a trigger on all the insertions,
    • For custom-generated values, you should improve how the values are generated to avoid collisions before they reach the database.

It is not the way to go.

Atmo
  • 2,281
  • 1
  • 2
  • 21
  • As described in my question. I am not looking to debate the various workarounds. Switching to a larger data type is obviously an option, but it also comes at the cost of increased storage requirements. Storage may be cheap in general but depending on a developer's constraints they may still feel the crunch of storage limitations. For example, if they are implementing something on a small embedded device. ;) It depends on their requirements. – Chris Malek Apr 05 '23 at 16:05
  • Then my feedback to solution 5 is this: it is painfully slow. If you talk about embedded device, you also have to account for the limited computing power and possibly limited autonomy on battery. Things that, IMHO, make solution 5 unsuitable for them as well. Adding 4 bytes to every record usually means increasing the table size by a few percents only. A trigger, especially on large table, could deteriorate performance by a factor of more than 2 (>100% slower). – Atmo Apr 05 '23 at 16:11
  • If you have a requirement about not using a sequence, perhaps you should not have written a table that contains a sequence, saying you want to manage the cycle of a sequence. Everything in your question revolves around the fact you can use a sequence for that table, hence the solution is to increase its size beyond any possibility that it will ever cycle. My answer clearly states option 2 is a valid solution but it provides no benefits over option 1 (in the context of your question) whereas option 1 does provide additional benefits over option 2, so should be preferred. – Atmo Apr 05 '23 at 17:33
  • You seem to believe speed is the only requirement that ever matters. If speed is your main requirement then yes, definitely option 1 is probably your best bet. If you are more concerned about security then option 2 can be a better one (please refer to the 2nd link in my question for their explanation). Or if you are more concerned about space then option 5 could be the better option. As I have contended from the beginning, your choice should depend on your individual requirements and testing. My question is not about which is the better option – Chris Malek Apr 05 '23 at 17:44
  • 2
    @ChrisMalek - *Switching to a larger data type (...) also comes at the cost of increased storage requirements.* Not necessary, see [Benchmark: bigint vs int on PostgreSQL](https://stackoverflow.com/a/38054689/1995738) – klin Apr 05 '23 at 18:55
  • Thanks @klin. I was not aware of that possibility. That would indeed make switching worthwhile in our case. I will have to run some tests to confirm but this would be great news. – Chris Malek Apr 06 '23 at 10:50
  • @klin I have run a very very basic test and can confirm that there is only a 4% increase in space usage between using integer and bigint in my test. This is close enough for me to consider the difference insignificant enough to switch between the two. I will have to look for space savings elsewhere. Thanks again for the info. – Chris Malek Apr 11 '23 at 16:28