97

I'm looking for a way to encrypt/obfuscate an integer ID into another integer. More precisely, I need a function int F(int x), so that

  • x<->F(x) is one-to-one correspondence (if x != y, F(x) != F(y))
  • given F(x), it's easy to find out x - so F is not a hash function
  • given x and F(x) it's hard/impossible to find out F(y), something like x ^ 0x1234 won't work

For clarity, I'm not looking for a strong encryption solution, it's only obfuscation. Imagine a web application with urls like example.com/profile/1, example.com/profile/2 etc. The profiles themselves are not secret, but I'd like to prevent casual voyeurs to view/fetch all profiles one after another, so I'd rather hide them behind something like example.com/profile/23423, example.com/profile/80980234 etc. Although database-stored tokens can do the job quite easily, I'm curious if there's some simple math available for this.

One important requirement I wasn't clear about is that results should look "random", that is, given a sequence x,x+1,...,x+n , F(x),F(x+1)...F(x+n) shouldn't form a progression of any kind.

jdphenix
  • 15,022
  • 3
  • 41
  • 74
georg
  • 211,518
  • 52
  • 313
  • 390

11 Answers11

42

Obfuscate it with some combination of 2 or 3 simple methods:

  • XOR
  • shuffle individual bits
  • convert to modular representation (D.Knuth, Vol. 2, Chapter 4.3.2)
  • choose 32 (or 64) overlapping subsets of bits and XOR bits in each subset (parity bits of subsets)
  • represent it in variable-length numberic system and shuffle digits
  • choose a pair of odd integers x and y that are multiplicative inverses of each other (modulo 232), then multiply by x to obfuscate and multiply by y to restore, all multiplications are modulo 232 (source: "A practical use of multiplicative inverses" by Eric Lippert)

Variable-length numberic system method does not obey your "progression" requirement on its own. It always produces short arithmetic progressions. But when combined with some other method, it gives good results.

The same is true for the modular representation method.

Here is C++ code example for 3 of these methods. Shuffle bits example may use some different masks and distances to be more unpredictable. Other 2 examples are good for small numbers (just to give the idea). They should be extended to obfuscate all integer values properly.

// *** Numberic system base: (4, 3, 5) -> (5, 3, 4)
// In real life all the bases multiplied should be near 2^32
unsigned y = x/15 + ((x/5)%3)*4 + (x%5)*12; // obfuscate
unsigned z = y/12 + ((y/4)%3)*5 + (y%4)*15; // restore

// *** Shuffle bits (method used here is described in D.Knuth's vol.4a chapter 7.1.3)
const unsigned mask1 = 0x00550055; const unsigned d1 = 7;
const unsigned mask2 = 0x0000cccc; const unsigned d2 = 14;

// Obfuscate
unsigned t = (x ^ (x >> d1)) & mask1;
unsigned u = x ^ t ^ (t << d1);
t = (u ^ (u  >> d2)) & mask2;
y = u ^ t ^ (t << d2);

// Restore
t = (y ^ (y >> d2)) & mask2;
u = y ^ t ^ (t << d2);
t = (u ^ (u >> d1)) & mask1;
z = u ^ t ^ (t << d1);

// *** Subset parity
t = (x ^ (x >> 1)) & 0x44444444;
u = (x ^ (x << 2)) & 0xcccccccc;
y = ((x & 0x88888888) >> 3) | (t >> 1) | u; // obfuscate

t = ((y & 0x11111111) << 3) | (((y & 0x11111111) << 2) ^ ((y & 0x22222222) << 1));
z = t | ((t >> 2) ^ ((y >> 2) & 0x33333333)); // restore
Evgeny Kluev
  • 24,287
  • 7
  • 55
  • 98
  • Thank you for your answer. If you could provide some pseudo-code examples, if would be great. – georg Dec 19 '11 at 08:07
  • 3
    @thg435 I used C++ instead of pseudocode. Did not want to give untested examples. – Evgeny Kluev Dec 19 '11 at 12:41
  • 1
    When I try the numeric system base code above with x=99, I get z=44. – Harvey Jun 21 '13 at 15:40
  • @Harvey: to get reversible obfuscator the product of all bases should be larger than number to obfuscate. In this example 3*4*5=60, so any larger number (like 99) will not necessarily be restored to the same value. – Evgeny Kluev Jun 21 '13 at 16:04
  • @EvgenyKluev: How would I use it to encode any 32-bit number? Would I need to use 64-bit numbers? – Harvey Jun 22 '13 at 03:17
  • 1
    @Harvey: Also it is possible to get product of all bases smaller but very close to 2^32, and then obfuscate the remaining values using a small table. In this case everything remains in 32-bit numbers. – Evgeny Kluev Jun 22 '13 at 09:49
  • After reading OP's and this answer I genuinely thought 'numberic' was a real word I didn't know. – Trap Oct 14 '20 at 09:34
8

You want the transformation to be reversible, and not obvious. That sounds like an encryption that takes a number in a given range and produces a different number in the same range. If your range is 64 bit numbers, then use DES. If your range is 128 bit numbers then use AES. If you want a different range, then your best bet is probably Hasty Pudding cipher, which is designed to cope with different block sizes and with number ranges that do not fit neatly into a block, such as 100,000 to 999,999.

rossum
  • 15,344
  • 1
  • 24
  • 38
  • Interesting stuff, but it may be a bit hard to ask someone to implement a cipher that 1) has not been tested well and 2) had not been tested well because it is so hard to understand :) – Maarten Bodewes Dec 19 '11 at 02:20
  • Thanks! I'm trying to keep it as simple as possible though. – georg Dec 19 '11 at 08:09
  • If you can't find an implementation of Hasty Pudding (you only need one of the allowed sizes) then you can easily implement a simple 4-round Feistel cipher (http://en.wikipedia.org/wiki/Feistel_cipher) in an even block size. Just keep encrypting until the output is in the correct range, as with Hasty Pudding. Not secure, but enough to obfuscate. – rossum Dec 19 '11 at 13:08
  • NSA has now released the [Speck](https://en.wikipedia.org/wiki/Speck_(cipher)) cipher which includes versions which includes 32-bit and 48-bit block sizes. That may also be useful for obfuscating numbers with those sizes. The 32-bit version in particular is likely to be useful. – rossum May 22 '16 at 13:03
5

I found this particular piece of Python/PHP code very useful:

https://github.com/marekweb/opaque-id

georg
  • 211,518
  • 52
  • 313
  • 390
miracle2k
  • 29,597
  • 21
  • 65
  • 64
5

I wrote some JS code using some of the ideas in this thread:

const BITS = 32n;
const MAX = 4294967295n;
const COPRIME = 65521n;
const INVERSE = 2166657316n;
const ROT = 6n;
const XOR1 = 10296065n; 
const XOR2 = 2426476569n;


function rotRight(n, bits, size) {
    const mask = (1n << bits) - 1n;
    // console.log('mask',mask.toString(2).padStart(Number(size),'0'));
    const left = n & mask;
    const right = n >> bits;
    return (left << (size - bits)) | right;
}

const pipe = fns => fns.reduce((f, g) => (...args) => g(f(...args)));

function build(...fns) {
    const enc = fns.map(f => Array.isArray(f) ? f[0] : f);
    const dec = fns.map(f => Array.isArray(f) ? f[1] : f).reverse();

    return [
        pipe(enc),
        pipe(dec),
    ]
}

[exports.encode, exports.decode] = build(
    [BigInt, Number],
    [i => (i * COPRIME) % MAX, i => (i * INVERSE) % MAX],
    x => x ^ XOR1,
    [x => rotRight(x, ROT, BITS), x => rotRight(x, BITS-ROT, BITS)],
    x => x ^ XOR2,
);

It produces some nice results like:

1 1352888202n 1 'mdh37u'
2 480471946n 2 '7y26iy'
3 3634587530n 3 '1o3xtoq'
4 2225300362n 4 '10svwqy'
5 1084456843n 5 'hxno97'
6 212040587n 6 '3i8rkb'
7 3366156171n 7 '1jo4eq3'
8 3030610827n 8 '1e4cia3'
9 1889750920n 9 'v93x54'
10 1017334664n 10 'gtp0g8'
11 4171450248n 11 '1wzknm0'
12 2762163080n 12 '19oiqo8'
13 1621319561n 13 'qtai6h'
14 748903305n 14 'cdvlhl'
15 3903018889n 15 '1sjr8nd'
16 3567473545n 16 '1mzzc7d'
17 2426613641n 17 '144qr2h'
18 1554197390n 18 'ppbudq'
19 413345678n 19 '6u3fke'
20 3299025806n 20 '1ik5klq'
21 2158182286n 21 'zoxc3y'
22 1285766031n 22 'l9iff3'
23 144914319n 23 '2ea0lr'
24 4104336271n 24 '1vvm64v'
25 2963476367n 25 '1d0dkzz'
26 2091060108n 26 'ykyob0'
27 950208396n 27 'fpq9ho'
28 3835888524n 28 '1rfsej0'
29 2695045004n 29 '18kk618'
30 1822628749n 30 'u559cd'
31 681777037n 31 'b9wuj1'
32 346231693n 32 '5q4y31'

Testing with:

  const {encode,decode} = require('./obfuscate')

  for(let i = 1; i <= 1000; ++i) {
        const j = encode(i);
        const k = decode(j);
        console.log(i, j, k, j.toString(36));
   }

XOR1 and XOR2 are just random numbers between 0 and MAX. MAX is 2**32-1; you should set this to whatever you think your highest ID will be.

COPRIME is a number that's coprime w/ MAX. I think prime numbers themselves are coprime with every other number (except multiples of themselves).

INVERSE is the tricky one to figure out. These blog posts don't give a straight answer, but WolframAlpha can figure it out for you. Basically, just solve the equation (COPRIME * x) % MAX = 1 for x.

The build function is something I created to make it easier to create these encode/decode pipelines. You can feed it as many operations as you want as [encode, decode] pairs. These functions have to be equal and opposite. The XOR functions are their own compliments so you don't need a pair there.


Here's another fun involution:

function mixHalves(n) {
    const mask = 2n**12n-1n;
    const right = n & mask;
    const left = n >> 12n;
    const mix = left ^ right;
    return (mix << 12n) | right;
}

(assumes 24-bit integers -- just change the numbers for any other size)

mpen
  • 272,448
  • 266
  • 850
  • 1,236
  • 1
    cool, thanks for sharing! BTW what's "32n"? Never saw this before. – georg Apr 05 '19 at 08:01
  • 1
    `n` is a number postfix for [BigInts](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/BigInt). It's a new JS feature that allows you to process really big numbers. I needed to use it because I'm multiplying by really big numbers which might cause one of the intermediate values to temporarily exceed `Number.MAX_SAFE_INTEGER` and lose precision. – mpen Apr 05 '19 at 08:09
5

Obfuscation is not really sufficient in terms of security.

However, if you are trying to thwart the casual onlooker, I'd recommend a combination of two methods:

  • A private key that you combine with the id by xor'ing them together
  • Rotating the bits by a certain amount both before and after the key has been applied

Here is an example (using pseudo code):

  def F(x)
    x = x XOR 31415927       # XOR x with a secret key
    x = rotl(x, 5)           # rotate the bits left 5 times
    x = x XOR 31415927       # XOR x with a secret key again
    x = rotr(x, 5)           # rotate the bits right 5 times
    x = x XOR 31415927       # XOR x with a secret key again
    return x                 # return the value
  end

I haven't tested it, but I think this is reversible, should be fast, and not too easy to tease out the method.

IAmNaN
  • 10,305
  • 3
  • 53
  • 51
  • There's also addition of a constant mod 2^32 (because your bit rotation reminded me of rot13, everyone's favorite trivially reversible function). – ccoakley Dec 18 '11 at 20:50
  • That's really just `return x XOR rotr(31415927, 5)` though, right? The last xor undoes the first, and the rotates undo eachother.. of course any chain of reversible operations is also reversible, so it does satisfy that condition. – harold Dec 18 '11 at 21:04
  • I've run a few brief tests and am satisfied the results are as expected. As ccoakley mentions, rot13 can be used in place of rot5, any rotation will work (caveat: 0 > rot > integer-size) and could be considered another key. There are other things you could throw in here, like modulus as he suggests, and long as they're reversible as harold mentioned. – IAmNaN Dec 18 '11 at 21:43
  • @harold: Yeah, just rotating once to the left with a prime and once to the right with another prime would be more advisable. Adding and throwing away the carry (identical to mod largest integer + 1) would add some obfuscation as well. You can mix all these up as long as you don't accidentially undo your previous operations. The reverse is just the reverse. – Maarten Bodewes Dec 19 '11 at 02:16
  • 1
    Sorry, but @harold is mostly correct - your entire function is equivalent to `x = x XOR F(0)`, or `x = x XOR 3087989491`, or `x = x XOR rotr(31415927, 5)`. Your first and last xors negate each other, so all you're doing is xoring the bitshifted input with the key - or equivalently, xoring the input with the bitshifted key. Note this is true even if you used different keys for each stage - all the keys could be composited into a single key that can be xored with the plaintext. – Nick Johnson Dec 19 '11 at 04:16
  • 2
    It's even worse, it's pretty easy to prove that any chain of rotates by a constant offset and xors with a constant can be condensed to just one rotate and just one xor. Two rotates after eachother can be combined (add their offset), two xors after eachother can be combined (xor with the xor of the two constants), and a xor/rot pair can be swapped to rot/xor by applying the same rotation to the constant in the xor. – harold Dec 19 '11 at 09:38
2

Do anything with the bits of the ID that won't destroy them. For example:

  • rotate the value
  • use lookup to replace certain parts of the value
  • xor with some value
  • swap bits
  • swap bytes
  • mirror the whole value
  • mirror a part of the value
  • ... use your imagination

For decryption, do all that in reverse order.

Create a program that will 'encrypt' some interesting values for you and put them in a table you can examine. Have same program TEST your encryption/decryption routine WITH all set of values that you want to have in your system.

Add stuff to the above list into the routines until your numbers will look properly mangled to you.

For anything else, get a copy of The Book.

Daniel Mošmondor
  • 19,718
  • 12
  • 58
  • 99
  • What you describe are the building blocks of a block cipher. It makes more sense to use an existing one than invent your own. – Nick Johnson Dec 19 '11 at 04:18
  • @NickJohnson I know that, did you clicked on the link in the last line of my post? – Daniel Mošmondor Dec 19 '11 at 07:31
  • I failed to put together a rotl/xor combination giving results that look "random" enough (see the update). Any pointers? – georg Dec 19 '11 at 09:30
  • @DanielMošmondor I know what you're linking to - but that doesn't change the fact that you're initially suggesting he build something himself, when it makes much more sense to just use an existing one? – Nick Johnson Dec 19 '11 at 23:24
  • @NickJohnson obviously OP doesn't want to use existing crypto, as he either wants to learn or not to learn new APIs. I totally can relate to that. – Daniel Mošmondor Dec 20 '11 at 00:36
  • @DanielMošmondor I don't think that's obvious at all from his question. Using a simple but robust cipher like XXTEA seems totally justified based on his question. – Nick Johnson Dec 20 '11 at 02:18
2

I wrote an article on secure permutations with block ciphers, which ought to fulfil your requirements as stated.

I'd suggest, though, that if you want hard to guess identifiers, you should just use them in the first place: generate UUIDs, and use those as the primary key for your records in the first place - there's no need to be able to convert to and from a 'real' ID.

shmosel
  • 49,289
  • 6
  • 73
  • 138
Nick Johnson
  • 100,655
  • 16
  • 128
  • 198
  • 2
    @thg435 If you're interested in this approach, a useful search term is "Format-Preserving Encryption". The wikipedia page covers the Black/Rogaway paper mentioned in Nick's article, as well as more recent developments. I've successfully used FPE for something similar to what you're doing; although in my case I added a few bits besides the id which I used for some light validity checking. – Paul Du Bois Jun 12 '13 at 01:22
1

Not sure how "hard" you need it to be, how fast, or how little memory to use. If you have no memory constraints you could make a list of all integers, shuffle them and use that list as a mapping. However, even for a 4 byte integer you would need a lot of memory.

However, this could be made smaller so instead of mapping all integers you would map only 2 (or worst case 1) byte and apply this to each group in the integer. So, using 2 bytes a integer would be (group1)(group2) you would map each group through the random map. But that means that if you only change group2 then the mapping for group1 would stay the same. This could "fixed" by mapping different bits to each group.

So, *(group2) could be (bit 14,12,10,8,6,4,2,0) so, adding 1 would change both group1 and group2.

Still, this is only security by obscurity, anyone that can feed numbers into your function (even if you keep the function secret) could fairly easily figure it out.

Roger Lindsjö
  • 11,330
  • 1
  • 42
  • 53
  • Depending on the constraints of the system this probably won't work, because if you can invert F(x) back to x then you'd have to have the permutation available, from which you could then easily compute F(y) given any arbitrary y. – templatetypedef Dec 18 '11 at 20:06
  • @templatetypedef As I said, this is only security by obscurity. The permutation would have to be known, but you could see the permutation as the "key". The largest problem here is that the OP seems to want to be able to encrypt all messages in one set (a small one) where the encrypted message should fit in the same set and this should be valid for all messages in the set. – Roger Lindsjö Dec 18 '11 at 20:15
  • Thanks. I'm trying to avoid any lookup tables. – georg Dec 19 '11 at 09:27
1

What you're describing here seems to be the opposite of a one-way function: it's easy to invert but super difficult to apply. One option would be to use a standard, off-the-shelf public-key encryption algorithm where you fix a (secret, randomly-chosen) public key that you keep a secret and a private key that you share with the world. That way, your function F(x) would be the encryption of x using the public key. You could then easily decrypt F(x) back to x by using the private decryption key. Notice that the roles of the public and private key are reversed here - you give out the private key to everyone so that they can decrypt the function, but keep the public key secret on your server. That way:

  1. The function is a bijection, so it's invertible.
  2. Given F(x), x is efficiently computable.
  3. Given x and F(x), it is extremely difficult to compute F(y) from y, since without the public key (assuming you use a cryptographically strong encryption scheme) there is no feasible way to encrypt the data, even if the private decryption key is known.

This has many advantages. First, you can rest assured that the crypto system is safe, since if you use a well-established algorithm like RSA then you don't need to worry about accidental insecurity. Second, there are already libraries out there to do this, so you don't need to code much up and can be immune to side-channel attacks. Finally, you can make it possible for anyone to go and invert F(x) without anyone actually being able to compute F(x).

One detail- you should definitely not just be using the standard int type here. Even with 64-bit integers, there are so few combinations possible that an attacker could just brute-force try inverting everything until they find the encryption F(y) for some y even if they don't have the key. I would suggest using something like a 512-bit value, since even a science fiction attack would not be able to brute-force this.

Hope this helps!

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
  • But thg435 seems to be asking for encryption that can encrypt a small set of messages (4 byte messages) into the same set of messages, and the encryption should work for all messages. – Roger Lindsjö Dec 18 '11 at 20:20
  • Thank you for your answer. Using a full-blown encryption framework is perhaps the best way to do that, but a bit too "heavy" for my needs. – georg Dec 19 '11 at 09:29
1

Generate a private symmetric key for use in your application, and encrypt your integer with it. This will satisfy all three requirements, including the hardest #3: one would need to guess your key in order to break your scheme.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • thg435 asked for integer to integer (and as far as I understand it should work for all integers). Can you suggest an private key algorithm that would have these properties? – Roger Lindsjö Dec 18 '11 at 20:18
1

If xor is acceptable for everything but inferring F(y) given x and F(x) then I think you can do that with a salt. First choose a secret one-way function. For example S(s) = MD5(secret ^ s). Then F(x) = (s, S(s) ^ x) where s is chosen randomly. I wrote that as a tuple but you can combine the two parts into an integer, e.g. F(x) = 10000 * s + S(s) ^ x. The decryption extracts the salt s again and uses F'(F(x)) = S(extract s) ^ (extract S(s)^x). Given x and F(x) you can see s (though it is slightly obfuscated) and you can infer S(s) but for some other user y with a different random salt t the user knowing F(x) can't find S(t).

Ben Jackson
  • 90,079
  • 9
  • 98
  • 150