7

I would like to use the pseudo_encrypt function mentioned a few times on StackOverflow to make my IDs look more random: https://wiki.postgresql.org/wiki/Pseudo_encrypt

How can I customize this to output unique "random" numbers for just me. I read somewhere that you can just change the 1366.0 constant, but I don't want to take any risks with my IDs as any potential ID duplicates would cause major issues.

I really have no idea what each constant actually does, so I don't want to mess around with it unless I get some direction. Does anyone know which constants I can safely change?

Here it is:

CREATE OR REPLACE FUNCTION "pseudo_encrypt"("VALUE" int) RETURNS int     IMMUTABLE STRICT AS $function_pseudo_encrypt$
DECLARE
l1 int;
l2 int;
r1 int;
r2 int;
i int:=0;
BEGIN
    l1:= ("VALUE" >> 16) & 65535;
    r1:= "VALUE" & 65535;
    WHILE i < 3 LOOP
        l2 := r1;
        r2 := l1 # ((((1366.0 * r1 + 150889) % 714025) / 714025.0) * 32767)::int;
        r1 := l2;
        l1 := r2;
        i := i + 1;
END LOOP;
RETURN ((l1::int << 16) + r1);
END;
$function_pseudo_encrypt$ LANGUAGE plpgsql;

for bigint's

CREATE OR REPLACE FUNCTION "pseudo_encrypt"("VALUE" bigint) RETURNS bigint IMMUTABLE STRICT AS $function_pseudo_encrypt$
DECLARE
l1 bigint;
l2 bigint;
r1 bigint;
r2 bigint;
i int:=0;
BEGIN
    l1:= ("VALUE" >> 32) & 4294967295::bigint;
    r1:= "VALUE" & 4294967295;
    WHILE i < 3 LOOP
        l2 := r1;
        r2 := l1 # ((((1366.0 * r1 + 150889) % 714025) / 714025.0) * 32767*32767)::bigint;
        r1 := l2;
        l1 := r2;
        i := i + 1;
    END LOOP;
RETURN ((l1::bigint << 32) + r1);
END;
$function_pseudo_encrypt$ LANGUAGE plpgsql;
avian
  • 1,693
  • 3
  • 20
  • 30

2 Answers2

8

Alternative solution: use different ciphers

Other cipher functions are now available on postgres wiki. They're going to be significantly slower, but aside from that, they're better candidates for generating customized random-looking series of unique numbers.

For 32 bit outputs, Skip32 in plpgsql will encrypt its input with a 10 bytes wide key, so you just have to choose your own secret key to have your own specific permutation (the particular order in which the 2^32 unique values will come out).

For 64 bit outputs, XTEA in plpgsql will do similarly, but using a 16 bytes wide key.

Otherwise, to just customize pseudo_encrypt, see below:

Explanations about pseudo_encrypt's implementation:

This function has 3 properties

  • global unicity of the output values
  • reversability
  • pseudo-random effect

The first and second property come from the Feistel Network, and as already explained in @CodesInChaos's answer, they don't depend on the choice of these constants: 1366 and also 150889 and 714025.

Make sure when changing f(r1) that it stays a function in the mathematical sense, that is x=y implies f(x)=f(y), or in other words the same input must always produce the same output. Breaking this would break the unicity.

The purpose of these constants and this formula for f(r1) is to produce a reasonably good pseudo-random effect. Using postgres built-in random() or similar method is not possible because it's not a mathematical function as described above.

Why these arbitrary constants? In this part of the function:

r2 := l1 # ((((1366.0 * r1 + 150889) % 714025) / 714025.0) * 32767)::int;

The formula and the values 1366, 150889 and 714025 come from Numerical recipes in C (1992, by William H.Press, 2nd ed.), chapter 7: random numbers, specifically p.284 and 285. The book is not directly indexable on the web but readable through an interface here: http://apps.nrbook.com/c/index.html .It's also cited as a reference in various source code implementing PRNGs.

Among the algorithms discussed in this chapter, the one used above is very simple and relatively effective. The formula to get a new random number from a previous one (jran) is:

jran = (jran * ia + ic) % im;
ran = (float) jran / (float) im;  /* normalize into the 0..1 range */

where jran is the current random integer.

This generator will necessarily loop over itself after a certain number of values (the "period"), so the constants ia, ic and im have to be chosen carefully for that period to be as large as possible. The book provides a table p.285 where constants are suggested for various lengths of the period.

ia=1366, ic=150889 and im=714025 is one of the entries for a period of 229 bits, which is way more than needed.

Finally the multiplication by 32767 or 215-1 is not part of the PRNG but meant to produce a positive half-integer from the 0..1 pseudo-random float value. Don't change that part, unless to widen the blocksize of the algorithm.

Daniel Vérité
  • 58,074
  • 15
  • 129
  • 156
  • Wow, great response! So if I change let's say ia=1300 instead of 1366 it would still work and make it slightly harder to figure out because I'm not using constants used in publicly available examples? – avian Jun 08 '15 at 04:09
  • @Hawk: yes, it will work. The PRNG quality will theorically suffer, but given how it's used, it won't probably be noticeable in the function results. – Daniel Vérité Jun 08 '15 at 09:58
  • @Daniel: Your implementation on the wiki was edited recently. It now appears to swap L and R *twice* per iteration (effectively just looping `l1 := l1 # f(r1)`, with `r1` never changing). Am I missing something? Does this new version make any sense to you? – Nick Barnes Jun 09 '15 at 11:38
  • @NickBarnes: well spotted. I dislike that change because it reduces the dispersion in the output. Now results from consecutive inputs seem a bit "grouped" together. The initial intention was precisely to avoid that as much as possible, in addition to totally avoid collisions. I don't think it breaks the promise on collisions, though. – Daniel Vérité Jun 10 '15 at 13:36
  • @Daniel: It looks a little worse than just reduced dispersion. Since it never exchanges L and R, only 15 bits of the input get "encrypted". And the three iterations now have the effect of applying, reversing, and reapplying the first round of the cipher... The one upside I can see is that it is now a self-inverse, but it seems there are [far better ways to do that](http://stackoverflow.com/a/29962222/1104979). I have reverted the edit. – Nick Barnes Jun 10 '15 at 15:57
  • @NickBarnes: I agree. It was no longer a feistel network, that's not acceptable as an edit. – Daniel Vérité Jun 11 '15 at 16:09
  • @DanielVérité skip32 looked like the best option for me but it produces negative numbers. What's the best option for an int4 positive-only integer output? (0-2147483647). I imagine the scalar sizing would need adjusting to halve the period. An alternative seems to be changing the return type to int8 and just adding 2147483648 to the end of the calculation to offset it to positive. – Brad M Jun 11 '16 at 01:56
  • @BradM: adding at the end works but then the max becomes `2^32-1`, when you want `2^31-1`. To reduce the output range of `pseudo_encrypt`, see if http://stackoverflow.com/a/33811338/238814 helps. – Daniel Vérité Jun 11 '16 at 13:43
3

This function looks like a blockcipher based on a Feistel network - but it's lacking a key.

The Feistel construction is bijective, i.e. it guarantees that there are no collisions. The interesting part is: r2 := l1 # f(r1). As long as f(r1) only depends on r1 the pseudo_encrypt will be bijective, no matter what the function does.

The lack of key means that anybody who knows the source code can recover the sequential ID. So you're relying on security-though-obscurity.

The alternative is using a block cipher which takes a key. For 32 bit blocks there are relatively few choices, I know of Skip32 and ipcrypt. For 64 bit blocks there are many ciphers to choose from, including 3DES, Blowfish and XTEA.

CodesInChaos
  • 106,488
  • 23
  • 218
  • 262
  • Thanks for the thoughtful response. I understand that if someone knows the source code that they will be able to reverse engineer the IDs, but my concern is there a way for me to change some of the constants in this public source code so that they can only reverse engineer it if they guess the constants correctly? ... for example, can I change anything in this line so the output is different but still no collisions: r2 := l1 # ((((1366.0 * r1 + 150889) % 714025) / 714025.0) * 32767)::int; – avian Jun 07 '15 at 18:15
  • @Hawk No matter which numbers you choose, it will still be reversible. But if you choose bad numbers the output will have stronger patterns. I'd rather use one of the algorithms I mentioned instead of tweaking your function. – CodesInChaos Jun 07 '15 at 19:54
  • I totally get you regarding it being reversible. Given that (1) I'm not super comfortable building a skip32 or similar function in Postgres as I cannot find one via Google and (2) I'm ok if someone really tries hard they can decode, I just want to make it slightly more difficult but adjusting a few numbers. – avian Jun 07 '15 at 21:06
  • @Hawk: Postgres provides Blowfish and 3DES implementations in the [`pgcrypto` module](http://www.postgresql.org/docs/9.4/static/pgcrypto.html) – Nick Barnes Jun 08 '15 at 00:31