48

I'm using Postgres 9.5 and seeing some wired things here.

I've a cron job running ever 5 mins firing a sql statement that is adding a list of records if not existing.

INSERT INTO 
   sometable (customer, balance)
VALUES
    (:customer, :balance)
ON CONFLICT (customer) DO NOTHING

sometable.customer is a primary key (text)

sometable structure is:
id: serial
customer: text
balance: bigint

Now it seems like everytime this job runs, the id field is silently incremented +1. So next time, I really add a field, it is thousands of numbers above my last value. I thought this query checks for conflicts and if so, do nothing but currently it seems like it tries to insert the record, increased the id and then stops.

Any suggestions?

Christian
  • 6,961
  • 10
  • 54
  • 82
  • 7
    This is by design: http://postgresql.nabble.com/BUG-14126-INSERT-ON-CONFLICT-DO-NOTHING-auto-increments-serial-primary-key-when-no-insert-happens-td5902224.html But you don't need to worry about those gaps, they are not a problem. The **only** thing a sequence guarantees is that it never generates the same number twice. –  May 13 '16 at 08:21
  • 1
    If, for some reason, you actually *need* a gapless sequence (a rare requirement), then Postgres's `SERIAL`/`SEQUENCE` support is not what you should be using. See [this answer from Craig Ringer](http://stackoverflow.com/a/9985219/157957). – IMSoP May 13 '16 at 09:11
  • 4
    The gaps in the sequence are a problem if you are partitioning another table based on hashing that key. No way to guarantee a balanced set of partitions. I am facing this problem right now. – egalot Jul 31 '19 at 04:23
  • I feel like someone should have pointed you toward UUID keys as an alternative to serial keys. They very likely would solve your problem. – sage88 Mar 01 '21 at 22:03
  • 3
    The gaps aren't a problem for me, but I'm wanting to use a `smallserial` and the insert conflicts are blowing out the max size of the `smallint` even though in reality there are way fewer than 32k rows. With 32 parallel processes loading data, the conflicts reach the max within a few seconds... (this table has billions of rows, hence the desire to convert a ~100 char varchar down to a `smallint` since there are only ~20k distinct values) – J. Dimeo Apr 06 '21 at 05:16
  • @J.Dimeo: have you ever found a working solution? Having the exact same problem – Geert-Jan Dec 30 '22 at 23:54
  • @Geert-Jan unfortunately no. IIRC, I just managed this lookup table of 20k distinct values in memory/in application code :-( – J. Dimeo Jan 01 '23 at 03:10

4 Answers4

51

The reason this feels weird to you is that you are thinking of the increment on the counter as part of the insert operation, and therefore the "DO NOTHING" ought to mean "don't increment anything". You're picturing this:

  1. Check values to insert against constraint
  2. If duplicate detected, abort
  3. Increment sequence
  4. Insert data

But in fact, the increment has to happen before the insert is attempted. A SERIAL column in Postgres is implemented as a DEFAULT which executes the nextval() function on a bound SEQUENCE. Before the DBMS can do anything with the data, it's got to have a complete set of columns, so the order of operations is like this:

  1. Resolve default values, including incrementing the sequence
  2. Check values to insert against constraint
  3. If duplicate detected, abort
  4. Insert data

This can be seen intuitively if the duplicate key is in the autoincrement field itself:

CREATE TABLE foo ( id SERIAL NOT NULL PRIMARY KEY, bar text );
-- Insert row 1
INSERT INTO foo ( bar ) VALUES ( 'test' );
-- Reset the sequence
SELECT setval(pg_get_serial_sequence('foo', 'id'), 0, true);
-- Attempt to insert row 1 again
INSERT INTO foo ( bar ) VALUES ( 'test 2' )
     ON CONFLICT (id) DO NOTHING;

Clearly, this can't know if there's a conflict without incrementing the sequence, so the "do nothing" has to come after that increment.

IMSoP
  • 89,526
  • 13
  • 117
  • 169
  • Great explanation! Do you know if this is documented somewhere in PostgreSQL's doc? I searched but couldn't find anything. – GG. Nov 04 '21 at 18:29
  • 1
    @GG. [The Serial pseudo-type is documented here](https://www.postgresql.org/docs/current/datatype-numeric.html#DATATYPE-SERIAL) and explains that it's actually implemented as a default on the column, calling a function; but the [documentation for ON CONFLICT clauses](https://www.postgresql.org/docs/current/sql-insert.html#SQL-ON-CONFLICT) doesn't explicitly mention that defaults will be resolved first. Perhaps you could [propose an edit](https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=doc/src/sgml/ref/insert.sgml). – IMSoP Nov 04 '21 at 19:03
  • Great answer. I noticed there were large gaps in our primary keys. I wasn't sure if there was some error being made on our part, but we're making heavy use of ON CONFLICT. So this explains the observations. Also note that there is a CYCLE option for sequences that enables wrap-around behavior when it exceeds the max. – Matt Hancock Feb 18 '22 at 21:41
5

As already said by @a_horse_with_no_name and @Serge Ballesta serials are always incremented even if INSERT fails.

You can try to "rollback" serial value to maximum id used by changing the corresponding sequence:

SELECT setval('sometable_id_seq', MAX(id), true) FROM sometable;
Adam
  • 5,403
  • 6
  • 31
  • 38
  • 1
    Note that passing `true` as the third parameter to `setval` is the same as not passing a third parameter. `setval('sometable_id_seq', MAX(id), true)` is equivalent to `setval('sometable_id_seq', MAX(id))` – BrDaHa May 24 '17 at 21:07
1

As said by @a_horse_with_no_name, that is by design. Serial type fields are implemented under the hood through sequences, and for evident reasons, once you have gotten a new value from a sequence, you cannot rollback the last value. Imagine the following scenario:

  • sequence is at n
  • A requires a new value : got n+1
  • in a concurrent transaction B requires a new value: got n+2
  • for any reason A rollbacks its transaction - would you feel safe to reset sequence?

That is the reason why sequences (and serial field) just document that in case of rollbacked transactions holes can occur in the returned values. Only unicity is guaranteed.

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • 2
    Yeah thanks I got that, but in my point of view a +1 doesn't make sense if do nothing is provided at the query. if should check before actually adding whether it needs to add anything. Also I am worried about the gaps. This job runs every 5 mins... – Christian May 13 '16 at 08:40
  • 1
    so only thing I can do is actually check by myself whether it's needed to add this record by selecting all of them :-/ – Christian May 13 '16 at 08:41
  • 2
    @Christian: there is nothing you _need_ to do. Just ignore the gaps. There is nothing wrong with ID values having (large gaps) –  May 13 '16 at 08:43
  • 7
    @a_horse_with_no_name Unless you are inserting enough data to overflow a `SERIAL` in a "short" (purposefully vague) amount of time. I was going to do this for a query that runs ~50 times/second, but I would run out of unique values in roughly 1.36 years. I wish instead of being forced to use `BIGSERIAL` there was a command like `SELECT id FROM table WHERE uniqueCol = 'val' ON EMPTY INSERT INTO table (uniqueCol) VALUES ('val') ON CONFLICT(uniqueCol) DO UPDATE SET uniqueCol = EXCLUDED.uniqueCol RETURNING id`. That would significantly reduce the number of wasted id's. – Jeff G Oct 11 '16 at 00:12
  • This sounds like a use case for a UUID rather than a serial key. – sage88 Mar 01 '21 at 21:59
-4

Well there is technique that allows you to do stuff like that. They call insert mutex. It is old old old, but it works.

https://www.percona.com/blog/2011/11/29/avoiding-auto-increment-holes-on-innodb-with-insert-ignore/

Generally idea is that you do INSERT SELECT and if your values are duplicating the SELECT does not return any results that of course prevents INSERT and the index is not incremented. Bit of mind boggling, but perfectly valid and performant.

This of course completely ignores ON DUPLICATE but one gets back control over the index.

sp3c1
  • 1,336
  • 1
  • 8
  • 6