9

I need a primary key for a PostgreSQL table. The ID should consist out of a number from about 20 numbers.

I am a beginner at database and also worked not with PostgreSQL. I found some examples for a random id, but that examples where with characters and I need only an integer.

Can anyone help me to resolve this problem?

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
user3049045
  • 111
  • 1
  • 1
  • 5
  • 2
    Why does the integer need to be random instead of regular seeded, incrementing integer? Random won't (especially out of 20) guarantee uniqueness. This clarification of requirements could help in determining the solution. – MikeSmithDev Jan 02 '14 at 19:19
  • Do you mean 20 numbers? Or *20 digit numbers*? i.e. do you want to pick a random number from 0 to 20, or pick a number like 123458888812111142123? – Craig Ringer Jan 02 '14 at 19:22

1 Answers1

22

I'm guessing you actually mean random 20 digit numbers, because a random number between 1 and 20 would rapidly repeat and cause collisions.

What you need probably isn't actually a random number, it's a number that appears random, while actually being a non-repeating pseudo-random sequence. Otherwise your inserts will randomly fail when there's a collision.

When I wanted to do something like this a while ago I asked the pgsql-general list, and got a very useful piece of advice: Use a feistel cipher over a normal sequence. See this useful wiki example. Credit to Daniel Vérité for the implementation.

Example:

postgres=# SELECT n, pseudo_encrypt(n) FROM generate_series(1,20) n;
 n  | pseudo_encrypt 
----+----------------
  1 |     1241588087
  2 |     1500453386
  3 |     1755259484
  4 |     2014125264
  5 |      124940686
  6 |      379599332
  7 |      638874329
  8 |      898116564
  9 |     1156015917
 10 |     1410740028
 11 |     1669489846
 12 |     1929076480
 13 |       36388047
 14 |      295531848
 15 |      554577288
 16 |      809465203
 17 |     1066218948
 18 |     1326999099
 19 |     1579890169
 20 |     1840408665
(20 rows)

These aren't 20 digits, but you can pad them by multiplying them and truncating the result, or you can modify the feistel cipher function to produce larger values.

To use this for key generation, just write:

CREATE SEQUENCE mytable_id_seq;

CREATE TABLE mytable (
    id bigint primary key default pseudo_encrypt(nextval('mytable_id_seq')),
    ....
);

ALTER SEQUENCE mytable_id_seq OWNED BY mytable;
Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • 4
    Should you need the 64 bits variant (or 19 decimal digits), get it from [pseudo_encrypt() function in plpgsql that takes bigint](http://stackoverflow.com/a/12761795/238814) – Daniel Vérité Jan 03 '14 at 18:57
  • 1
    how would you seed the function? – nakhli Jun 05 '15 at 14:40
  • 1
    @Sinbadsoft.com It isn't an iterative pseudo random generator. It doesn't take a "seed" as such. It maps inputs to outputs 1:1 in a random *seeming* manner. – Craig Ringer Jun 05 '15 at 14:51
  • @CraigRinger I get that it is a pseudo random generator, and as such I expect it to take a seed. http://en.wikipedia.org/wiki/Pseudorandom_number_generator – nakhli Jun 05 '15 at 15:11
  • 1
    @Sinbadsoft.com Er... I just said that it is *not* an iterative pseudo random generator. It's a feistel cipher or network, a 1:1 mapping of input to output. The network parameters (in the function body) control the particular mapping used and are as close to a "seed" as it has. – Craig Ringer Jun 05 '15 at 23:12
  • @CraigRinger sorry I didn't read your comment correctly! What parameters would you tune to change the sequence values? – nakhli Jun 06 '15 at 19:14
  • @Sinbadsoft.com I haven't needed to, so unsure. Take a look at the originally linked page (https://wiki.postgresql.org/wiki/Pseudo_encrypt) and its reference to the underlying http://en.wikipedia.org/wiki/Feistel_cipher . – Craig Ringer Jun 07 '15 at 08:55
  • Will have a look at the internals of the function then, thank you for the pointers @CraigRinger. – nakhli Jun 08 '15 at 08:10
  • Two things: In the example above, generate_series() == 1,2,3... is effectively what @CraigRinger is referring to as a seed. Secondly, even with a scheme like this, there will eventually be collisions if you have two independent generators. What matters is how these random numbers behave when there is a collision. If you have F(x) == F(y) then x==y so the next values, F(x+1) and F(y+1) will also be the same and so as soon as you have a collision, the streams are going to be identical. With a true random number generator you will have (extremely rare) collisions but they will be one-offs. – Max Murphy Dec 30 '17 at 11:37
  • I assume this means that if the cipher is ever leaked, then it will be fairly trivial to convert back to the sequential keys. If this is an unacceptable risk, is there a good way to do this with a proper PRNG? – semicolon Dec 19 '19 at 02:47
  • @semicolon Correct, since there's deliberately a 1:1 mapping of input to output. If you want to use a PRNG you can, but you must then be prepared to handle unique violations and retry with a new key. Or just use UUID and the amazing power of the 128 bit space. – Craig Ringer Dec 19 '19 at 05:43
  • @CraigRinger Since we want to base58 encode these id's for external usage, I think we will have to use a PRNG. Most likely a table to store all the keys generated so far, that other tables have a foreign key constraint to. Is it possible to have a `default` value that generates a random number that is not already in the table (via retrying)? – semicolon Dec 19 '19 at 06:25
  • @semicolon Makes sense. I used the approach above because I didn't care about the original IDs being secret, just not immediately obviously sequential. Re the DEFAULT - not really, you'd be very deadlock prone. Use a separate recursive writeable CTE or something to do it. – Craig Ringer Dec 20 '19 at 04:05
  • @CraigRinger Yeah that makes sense. Do you know where I could find a good guide on how to use a recursive writeable CTE to implement this? My SQL-fu is not at that point yet. – semicolon Dec 20 '19 at 04:27