1

I have three main constraints for int8 key generation:

  • Keys need be unpredictably random
    • Number of rows created should not be calculable
    • Order of row creation should not be calculable
  • Keys need to be unique across the entire db
    • Future table merges should not require changing keys
    • Future Table-Per-Class superclass additions should not require changing keys
  • Keys need to be bounded
    • These keys will be base58 encoded in various places such as URLs
    • Range will be narrower at first but should be adjustable in future to meet demand

I have heard of a few approaches to tackling at least some of these constraints:

DB-wide serial8

Does not meet the first constraint, but meets the second and third constraints.

DB-wide serial8 combined with a cipher

If the cipher key is leaked or discovered at any point then this becomes just DB-wide serial8, since the cipher key can't be changed without changing all the existing keys.

UUIDs

Does not meet the last constraint due to their 128-bit nature, but meets the first and second constraints.

Set default to floor(random() * n)

Does not meet the second constraint due to collision risk, but meets the first and last constraints.

Using random combined with a keys base table to keep track of all the keys

This does sort of meet all the constraints. However if a duplicate key is generated the insertion will fail. I am also concerned about any performance / locking issues.

If there is a nice way to make this re-roll until a non-colliding key is found, with good performance, then this would be an acceptable solution.


What is the best way to generate keys in PostgreSQL that meets the three criteria I listed?

semicolon
  • 2,530
  • 27
  • 37
  • 1
    Use a `uuid` if you want a "random" unique value. If you want efficiency, use a sequential value. Trying to do both is not going to be simple. – Gordon Linoff Dec 19 '19 at 12:57
  • @GordonLinoff It's partially about efficiency, but also just about how huge URLs with UUIDs in them are. – semicolon Dec 19 '19 at 13:00
  • Ultimately I need to give out a <10 char key to the user, and I need to never give out the same key for two different objects, and I need to not reveal internal DB information. – semicolon Dec 19 '19 at 13:11
  • You should edit your question to answer the six questions I give in "[Unique Random Identifiers](https://peteroupc.github.io/random.html#Unique_Random_Identifiers)", as they will help you decide how best to generate unique identifiers. In your case, are the IDs the only thing that grants access to the resources they identify, or are only authorized users allowed to access them, even if they know the ID? Do the IDs have to be memorable? Also, what kinds of resources do the IDs identify, and what do you believe is the maximum number of IDs your application can have? – Peter O. Dec 19 '19 at 13:31
  • @PeterO. To the six questions linked: (1) in theory yes, they are all in a single DB. (2) no, if duplicate is generated it must be rerolled. (3) somewhat hard to guess and dont reveal insertion order or row count. (4) Yes. (5) no, separate authorization. (6) No. – semicolon Dec 19 '19 at 23:00
  • @PeterO. To your questions: (1) There is separate authorization. (2) Do not have to be memorable. (3) All kinds of things, users, articles, relations, topics. (4) So i'd like something flexible, long term I'd like to err on the side of just assume a quadrillion or whatever for safety, but medium term I can constraint it to like a billion or so to keep the base58 encoded keys shorter. – semicolon Dec 19 '19 at 23:09
  • Well, he only sensible way I see is unique int8->int8 mapping, first int8 being DB-wide serial8, second int8 being key. There are several constructs which could be used to do such mapping, but details to be kept private. – Severin Pappadeux Dec 21 '19 at 23:41
  • And any key design based upon random() (or similar function) are open and could be cracked by definition - it is just an algorithm with known and published (for a given version of PSQL) implementation – Severin Pappadeux Dec 21 '19 at 23:43
  • @SeverinPappadeux Once you crack the cipher, you can convert any desired sequential id into the "random" id, and you can also convert back. This means all the usual issues like leaking row insertion frequency and the order in which things were inserted, as well as easy scraping. You can't really "crack" `random()`, you can maybe find some patterns, but as long as `random()` keeps obtaining new entropy not too infrequently then you will NOT have the same vulnerabilities. – semicolon Dec 27 '19 at 00:11
  • @semicolon Well then, I have news for you - random() function in postgresql is an implementation of erand48 from POSIX, which is linear congruential generator and very easy to invert. There is no entropy source in random(), none whatsoever – Severin Pappadeux Dec 27 '19 at 00:53
  • @SeverinPappadeux That's good to know, in that case I would probably like to find a way to use a function that does actually use an entropy source. – semicolon Dec 27 '19 at 04:21
  • You shall try to use https://www.postgresql.org/docs/current/pgcrypto.html#AEN135553, but, as soon as there is true RNG, there is probability of collisions (see birthday paradox mentioned by @PeterO.). – Severin Pappadeux Dec 27 '19 at 16:51
  • @SeverinPappadeux Yeah I plan on detecting collisions. Do you know what the best way to go about doing that is in PostgreSQL? – semicolon Dec 28 '19 at 11:26

1 Answers1

0

If you expect to have no more than a billion identifiers, then generating random 64-bit numbers (so 2^64 possible values) may suit your purposes, as long as you have a way to check those numbers for uniqueness. They should be generated using a cryptographic random number generator (such as secrets.SystemRandom in Python or random_int in PHP).

With random numbers this long, the expected chance of producing a duplicate will be less than 50% after generating a billion random numbers (see "Birthday problem" for more precise formulas).

Peter O.
  • 32,158
  • 14
  • 82
  • 96
  • What would be the best way to go about implementing this in PostgreSQL? Is there an easy way to auto-reroll the key if insertion fails (ideally in a way that is fast and supports bulk insertion). – semicolon Dec 19 '19 at 23:57
  • It appears that in PostgreSQL, uniqueness checks are not possible (or at least not trivial) without the risk of race conditions (see [this question](https://stackoverflow.com/questions/22618605)). Also, I'm not familiar enough with PostgreSQL to know whether PostgreSQL implements a cryptographic RNG in its SQL dialect (I suspect its `random()` isn't a cryptographic RNG). I see you have a [related question](https://stackoverflow.com/questions/59409519); maybe someone can answer there. – Peter O. Dec 20 '19 at 00:33
  • I was assuming that transactions would save me from that race condition, but now I am not so sure. I would be ok with a try-and-catch approach if that would avoid the race condition, as collisions will be very rare. – semicolon Dec 27 '19 at 00:16