5

I'm working in C#. I have an unsigned 32-bit integer i that is incremented gradually in response to an outside user controlled event. The number is displayed in hexadecimal as a unique ID for the user to be able to enter and look up later. I need i to display a very different 8 character string if it is incremented or two integers are otherwise close together in value (say, distance < 256). So for example, if i = 5 and j = 6 then:

string a = Encoded(i); // = "AF293E5B"
string b = Encoded(j); // = "CD2429A4"

The limitations on this are:

  1. I don't want an obvious pattern in how the string changes in each increment.
  2. The process needs to be reversible, so if given the string I can generate the original number.
  3. Each generated string needs to be unique for the entire range of a 32-bit unsigned integers, so that two numbers don't ever produce the same string.
  4. The algorithm to produce the string should be fairly easy to implement and maintain for both encoding and decoding (maybe 30 lines each or less).

However:

  1. The algorithm does not need to be cryptographically secure. The goal is obfuscation more than encryption. The number itself is not secret, it just needs to not obviously be an incrementing number.
  2. It is alright if looking at a large list of incremented numbers a human can discern a pattern in how the strings are changing. I just don't want it to be obvious if they are "close".

I recognize that a Minimal Perfect Hash Function meets these requirements, but I haven't been able to find one that will do what I need or learn how to derive one that will.

I have seen this question, and while it is along similar lines, I believe my question is more specific and precise in its requirements. The answer given for that question (as of this writing) references 3 links for possible implementations, but not being familiar with Ruby I'm not sure how to get at the code for the "obfuscate_id" (first link), Skipjack feels like overkill for what I need (2nd link), and Base64 does not use the character set I'm interested in (hex).

Community
  • 1
  • 1
hatch22
  • 797
  • 6
  • 18
  • "I don't want an obvious pattern in how the string changes in each increment." - Well, creating your own obfuscating pattern is obviously not a good idea too, unless you will heavily test it. Have you tried looking for any libraries/API that are available out there? – 123 456 789 0 Sep 09 '13 at 15:45
  • The only libraries I've seen are cryptographic in nature and don't generally deal well with numbers as small as 32-bit integers. A 32-bit block cypher would work but seems to be overkill for what I need, and I'd rather implement it myself since security is not an issue and I am trying to keep external dependencies down. – hatch22 Sep 09 '13 at 15:49
  • FWIW, the actual algorithm used by "obfuscate_id" is here: https://github.com/namick/scatter_swap/blob/master/lib/scatter_swap/hasher.rb – Daniel Hilgarth Sep 09 '13 at 15:53
  • Thanks. I'll see if I can port it if other answers don't end up working for me. – hatch22 Sep 09 '13 at 15:56
  • possible duplicate of [How to create unpredictable userids?](http://stackoverflow.com/questions/18461230/how-to-create-unpredictable-userids) – Jim Mischel Sep 09 '13 at 16:21
  • 1
    @hatch22 What is the point of obfuscating the ids if you don't also want it to be secure (i.e. that it's hard to reverse the encoding)? I mean what is an "obvious pattern"? At which point does it become "non-obvious". To me, these ambiguities are why it's not worth doing. If you need to make decoding difficult then make it _actually_ difficult (i.e. cryptographically secure). I may just be naive as to what the requirements are though. **edit** Ah if you are just trying to prevent stuff like JimMischel's web-scraping example, then the simple modular arithmetic trick MSalter has makes sense. – rliu Sep 09 '13 at 16:22
  • Also see [Obfuscating sequential keys](http://www.informit.com/guides/content.aspx?g=dotnet&seqNum=839) – Jim Mischel Sep 09 '13 at 16:23
  • @roliu: YouTube video ids are obfuscated sequential keys. One benefit is that it eliminates generating sequential keys that are substantially the same. YouTube gets around a million new videos per day. So if the keys weren't obfuscated, then the first 8 characters of every video uploaded each day would be the same. Also, it's easier for users to see the difference between "AF293E5B" and "CD2429A4" than between "ABCD123B" and "ABCD123C". – Jim Mischel Sep 09 '13 at 16:31
  • @roliu: My reasons are similar to what Jim gave for YouTube's obfuscated keys. In my case, IDs used by the same user will tend to cluster because they will be generated close together in time. I don't want a user who mistypes an ID to pull up the wrong record that still might belong to them. If a user generated record IDs 1005, 1006, and 1007, I don't want them pulling up 1006 when they meant 1005. – hatch22 Sep 09 '13 at 17:09
  • I also don't want users to be easily able to guess the IDs of someone they know started creating records shortly after them. – hatch22 Sep 09 '13 at 17:17

1 Answers1

3

y = p * x mod q is reversible if p and q are co-primes. In particular, mod 2^32 is easy, and any odd number is a co-prime of 2^32. Now 17,34,51,... is a bit too easy, but the pattern is less obvious for 2^31 < p < 2^32-2^30 (0x8000001-0xBFFFFFFF).

MSalters
  • 173,980
  • 10
  • 155
  • 350
  • 1
    This pretty much looks like RSA encryption, check http://stuvel.eu/rsa for a pure-Python implementation. The core is in ```rsa.prime``` and ```rsa.core```, you should be able to follow it enough to convert it into C# ;-) – dr. Sybren Sep 09 '13 at 16:16
  • Modular arithmetic is quite close to normal arithmetic. How do you solve `y = p*x` for `x`? You multiply both sides by `1/p`. In this case your domain isn't the reals, it's the integers up to `q-1`. So what is `1/p`? Well it's `1/p = z` such that `z*p = 1 mod q`, which leads to `z*p - 1 = kq => z*p - kq = 1` for some integer `k`. This equation has an _integer solution_ because `p` and `q` are co-prime and because of Bezout's identity: https://en.wikipedia.org/wiki/B%C3%A9zout%27s_identity. The short answer is... just find the inverse `z = 1/p` for `mod q`. Example: `q = 5, p = 2, 1/p = 3`. – rliu Sep 09 '13 at 16:18
  • Looks like I may have created a duplicate, and the answer linked there uses a 32-bit block cypher, but MSalters answer seems to be working for me, so I'll go ahead and accept it. – hatch22 Sep 09 '13 at 17:00