12

I need to generate unique random numbers in Postgresql with a fixed length of 13 digits. I've found a similar thread where was used a sequence encrypted using "pseudo_encrypt", but the returned number was not with a fixed length.

So, what i need is: get an encrypted random sequence with a fixed length of 13 digits where the min value is 0000000000001 and a max value is 9999999999999.

Is it possible? If start with the zeros in front is not possible is not a big problem (i think), i can set them programmatically during the reading from the db, but would be great if Postgresql can do it by itself.

-- EDIT --

After have realized some useful things i must change the question in order to explain better what i need:

I need to generate unique random numbers (bigint) in Postgresql with a fixed max length of 13 digits. Actually i'm trying to use pseudo_encrypt function (64 bit), but the returned number obviously is not with a fixed max length of 13, in the 32 bit case the max length is 10 digits (int), and for the 64 bit is 19 (bigint).

So, how to get an encrypted random sequence with a fixed max length of 13 digits, where the min value is 1 and the max value is 9999999999999 ?

Is it possible to modify the 64 bit pseudo_ecrypt function in order to get this result? Or if is not possible, there are other methods to get an unique sequence with this requirements?

Pseudo Encrypt function (64bit)

CREATE OR REPLACE FUNCTION pseudo_encrypt(VALUE bigint) returns bigint   AS $$
DECLARE
l1 bigint;
l2 bigint;
r1 bigint;
r2 bigint;
i int:=0;
BEGIN
l1:= (VALUE >> 32) & 4294967295::bigint;
r1:= VALUE & 4294967295;
WHILE i < 3 LOOP
    l2 := r1;
    r2 := l1 # ((((1366.0 * r1 + 150889) % 714025) / 714025.0) * 32767*32767)::int;
    l1 := l2;
    r1 := r2;
    i := i + 1;
END LOOP;
RETURN ((l1::bigint << 32) + r1);
END;
$$ LANGUAGE plpgsql strict immutable;
Community
  • 1
  • 1
MattC
  • 447
  • 6
  • 21
  • 1
    You can format the returned value from that function to have a fixed length: `to_char(pseudo_encrypt(nextval('seq')::int), '0000000000000')` –  Nov 17 '15 at 15:32
  • And what about the sequence? How i have to set it? – MattC Nov 17 '15 at 15:43
  • The same as in the question you linked to. Just make sure you have a maximum value that does not exceed the 13 digits. –  Nov 17 '15 at 15:45
  • Ok, i try and i let you know, thank you! – MattC Nov 17 '15 at 15:45
  • don't know why but the string is generated with a space char at the begin, i have used the pseudo_encrypt function from the wiki page http://wiki.postgresql.org/wiki/Pseudo_encrypt – MattC Nov 17 '15 at 16:21
  • Use the `TM` modifier for the `to_char()` function: http://www.postgresql.org/docs/current/static/functions-formatting.html#FUNCTIONS-FORMATTING-DATETIMEMOD-TABLE –  Nov 17 '15 at 17:16
  • 2
    Can I be the first to pedantically state that if the values are unique then by definition they cannot also each be random. Thank you. – David Aldridge Nov 17 '15 at 19:11
  • @DavidAldridge: I think it's more like "obfuscated" uniqueness so that the next number can't be guessed or a pattern in the numbers can't be exploited –  Nov 17 '15 at 20:27
  • @a_horse_with_no_name i have got errors if i use TM as modifier, seems is not recognized, do you think i can set a fixed length of 13 also for integer/bigint types using pseudo_encrypt? How can i do? Think about it are better indexed than strings. – MattC Nov 18 '15 at 09:38
  • The `TM` should work. What is the query you used and what is the error you got? –  Nov 18 '15 at 09:40
  • id_number character varying(30) DEFAULT TM to_char(pseudo_encrypt(nextval('seq')::int), '0000000000000'), but also TMto_char, sorry but i'm not good in sql, maybe is a silly mistake.. However, at the end thinking about it what i need is a bigint, because i need a number (int) not a string. It's my mistake i should specify before. – MattC Nov 18 '15 at 09:47
  • @a_horse_with_no_name Anyway also your to_char solution is not good for get a number with a range 1 to 9999999999999, because the pseudo_encrypt function return a max value of 2^32-1 (4294967295) that is 10 digits, in this case at least the firsts 3 digits will be "000" followed by the rest of number. I still need a solution. – MattC Nov 18 '15 at 11:09
  • Would it fit the problem if the number of values was `2^42 = 4398046511104` ? That means 13 decimal digits but less than half of `9999999999999` values ? – Daniel Vérité Nov 18 '15 at 19:09
  • @a_horse_with_no_name, my answer missed requirements, i deleted it. – Kirk Roybal Nov 18 '15 at 22:16
  • @DanielVérité I think that would be fine... But suppose that after a while 4398046511104 will no longer be enough due to exhaustion in the database, can i switch from 2^42 to 2^44 without the risk of collisions? However, how can i do for set 2^42 and 2^44? Thank you. – MattC Nov 19 '15 at 08:54

1 Answers1

13

Tweaking the existing function for N < 64 bits values

It's relatively simple to tweak the bigint variant to reduce the output to 2^N values, where N is even, and less than 64.

To get 13 decimal digits, consider the maximum N for which 2^N has 13 digits. That's N=42, with 2^42=4398046511104.

The algorithm works by breaking the input value into two halves with an equal number of bits, and make them flow through the Feistel network, essentially XOR'ing with the result of the round function and swapping halves at each iteration.

If at every stage of the process, each half is limited to 21 bits then the result combining both halves is guaranteed not to exceed 42 bits.

So here's my proposed variant:

CREATE OR REPLACE FUNCTION pseudo_encrypt42(VALUE bigint) returns bigint
 AS $$
DECLARE
  l1 bigint;
  l2 bigint;
  r1 bigint;
  r2 bigint;
  i int:=0;
  b21 int:=(1<<21)-1; -- 21 bits mask for a half-number => 42 bits total
BEGIN
  l1:= VALUE >> 21;
  r1:= VALUE & b21;
  WHILE i < 3 LOOP
    l2 := r1;
    r2 := l1 # (((((1366*r1+150889)%714025)/714025.0)*32767*32767)::int & b21);
    l1 := l2;
    r1 := r2;
    i := i + 1;
  END LOOP;
  RETURN ((l1::bigint << 21) + r1);
END;
$$ LANGUAGE plpgsql strict immutable;

The input must be less than (2^42)-1, otherwise the outputs will collide , as pseudo_encrypt42(x) = pseudo_encrypt42(x mod 2^42).

What can be done about the missing numbers between 2^42 and 10^13 ?

2^42 - 10^13 = 5601953488896 so that's quite a lot of missing numbers. I don't know how to help with that in a single pass with the Feistel network. One workaround that might be acceptable, though, is to generate another set of unique values in 0..M and add 2^42 to them, so there's no risk of collision.

This another set could be obtained by the same function, just with the offset added. 4398046511104 + pseudo_encrypt42(x) is guaranteed to be between 4398046511104 and 2*4398046511104 = 8796093022208 unique values so that's closer to the goal. The same technique could be applied with several other ranges, not even necessarily of the same size.

However this workaround degrades the random-looking behavior , as instead of having a single output range where every number can be between 0 and X, you'd get N distinct output ranges of X/N numbers. With several distinct partitions like that, it's easy to guess in what partition the output will be, just not what value inside the partition.

Daniel Vérité
  • 58,074
  • 15
  • 129
  • 156
  • 1
    This solves my problem, thank you Daniel! It's a very detailed response, i'm sure that this will be very useful also to other developers! – MattC Nov 20 '15 at 13:14
  • 1
    I think that in your case input must be less than `(2^21)` to avoid collisions, because you are using `VALUE >> (64-21)` for left part (it's like take 21 bits from left). In this case for values more than `2^21` and less than something you'll still get zero as left part but the right part will start from zero again. To fix it I used `VALUE >> 21` shift for left part (it's like take next 21 bits after first 21 bits) then your restriction `(2^42)` became valid. – plomovtsev Jul 26 '16 at 13:39
  • @mopdobopot: you're right, good catch. I've fixed the code in the answer. – Daniel Vérité Jul 26 '16 at 14:23