4

I am looking for a way to generate a random, unique 9 digit friend code for a user from a sequential user ID. The idea behind this is so people can't enumerate users by searching the friend codes one by one. If there are 1000 possible codes and 100 registered users, searching a random code should have a 10% chance of finding a user.

A possible way to do this is to generate a code randomly, check if the code is already in use, and if it is, try again. I am looking for an approach (mostly out of curiosity) where the friend code is generated algorithmically and is guarenteed to be unique for that user ID first try.

Specifically, given a range of numbers (1 to 999,999,999), running the function on this number should return another number in the same range, which is paired and unique to the input number. This pairing should only differ if the range changes and/or an input seed to the randomness changes.

An individual should ideally not be able to easily reverse engineer the user ID from the friend ID without knowing the seed and algorithm (or having a very large pool of samples and a lot of time - this does not need to be cryptographically secure), so simply subtracting the user ID from the maximum range is not a valid solution.

Here is some c# code that accomplishes what I am after by generating the entire range of numbers, shuffling the list, then retrieving a friend ID by treating the user ID as the list index:

int start = 1; // Starting number (inclusive)
int end = 999999999; // End number (inclusive)
Random random = new Random(23094823); // Random with a given seed

var friendCodeList = new List<int>();
friendCodeList.AddRange(Enumerable.Range(start, end + 1)); // Populate list

int n = friendCodeList.Count;

// Shuffle the list, this should be the same for a given start, end and seed
while (n > 1)
{
    n--;
    int k = random.Next(n + 1);
    int value = friendCodeList[k];
    friendCodeList[k] = friendCodeList[n];
    friendCodeList[n] = value;
}

// Retrieve friend codes from the list
var userId = 1;
Console.WriteLine($"User ID {userId}: {friendCodeList[userId]:000,000,000}");

userId = 99999999;
Console.WriteLine($"User ID {userId}: {friendCodeList[userId]:000,000,000}");

userId = 123456;
Console.WriteLine($"User ID {userId}: {friendCodeList[userId]:000,000,000}");

User ID 1: 054,677,867 User ID 99999999: 237,969,637 User ID 123456: 822,632,399

Unfortunately, this is unsuitable for large ranges - this program takes 8GB of RAM to run, with a 10 or 12 digit friend code it would not be feasible to pre-generate the list either in memory or a database. I am looking for a solution that does not require this pre-generation step.

I am interested in solutions that use either a seeded random number generator or bitwise trickery to achieve this, if it is possible. The above function is reversible (by searching the values of the list) but the solution does not need to be.

Emma
  • 27,428
  • 11
  • 44
  • 69
Neo
  • 474
  • 6
  • 13
  • "The easiest way to do this is to generate a code randomly, check if the code is already in use, and if it is, try again" - no, this is **not** the easiest way. Conceptually, it's simpler to just to use a block-cipher to encrypt your integer identifiers: https://stackoverflow.com/questions/959916/way-to-encrypt-a-single-int – Dai Jun 17 '20 at 22:38
  • This looks like what I'm after, except I'm not sure how to adapt a block cipher to a range of numbers instead of 2^n numbers. – Neo Jun 17 '20 at 22:47
  • Why not just use a `Guid`? They're generated based on the computer hardware, time of day, the algorithm version used to generate it, and some other uniqueness. – Rufus L Jun 17 '20 at 22:49
  • The output can only range from 1 to 999,999,999, a Guid is the wrong format. – Neo Jun 17 '20 at 22:52
  • The first part of this solution looks good except I'm unsure how to change the domain from 0->2^30-1 to 1->999999999: https://stackoverflow.com/a/11756222/9665729 – Neo Jun 17 '20 at 22:54
  • I'm a little tired so I may need to read it again, but a `Guid` is *globally* unique. No algorithm required. I don't see why you'd need to do anything except `person.FriendCode = new Guid();` Does it need to be related to the user id for any reason? – Rufus L Jun 17 '20 at 22:59
  • As mentioned, a GUID is in the wrong format – Neo Jun 17 '20 at 23:07
  • Another possible relevant post: https://crypto.stackexchange.com/questions/29469/is-there-a-way-to-encrypt-an-integer-within-an-arbitrary-range – Neo Jun 17 '20 at 23:09
  • A common technique is Hashids: https://hashids.org/ – Dai Jun 17 '20 at 23:25
  • Yeah, I still don't get it. What you want is a simple `List` of type integer where the `Index` of that `List` is equivalent to the "UserID", but the value is a guaranteed unique-to-the-list integer that is non-sequential. And according to your code that pool of potential numbers is equal to the count of entries in the `List`? And here is what make things even more confusing "I am looking for a solution that does not require this pre-generation step." What? – Barns Jun 17 '20 at 23:32
  • 1
    @Barns The OP is asking how to write a map-style function that always correctly maps one single integer to another (the stuff about List indexes is a distraction, imo). They're trying to invent their own cryptosystem (with a block-size of 4 bytes) - which as we all know (invariably through personal experience!) is a fool's errand. – Dai Jun 17 '20 at 23:35
  • "I am looking for a way to generate a random, unique 9 digit friend code for a user from a sequential user ID" - it's not random if it is generated from a sequential user ID. – Enigmativity Jun 18 '20 at 03:14
  • Did you consider whether to store (randomly generated) friend codes separately from user IDs in your database, to check new friend codes for uniqueness as they're generated, and to look up the corresponding user ID or friend code when the other value is given? If so, why doesn't this work for you? Also, consider that `System.Random` is not a secure random number generator. – Peter O. Jun 18 '20 at 03:32
  • What you are looking for is called a *permutation* in mathematics. If you don't care about security they are pretty easy to construct. – President James K. Polk Jun 18 '20 at 20:13

2 Answers2

6

Quick mathematics lesson!

You're thinking of developing a way to map one integer value (the original "secret" UserId value) to another (the (encrypted) "public" value) and back again. This is exactly what a block-cipher does (except each "block" is usually 16 bytes big instead of being a single character or integer value). So in other words, you want to create your own cryptosystem.

(Note that even if you're thinking of converting UserId 123 into a string instead of an integer, for example, a YouTube Video Id like "dQw4w9WgXcQ") - it's still an integer: because every scalar value stored in a computer, including strings, can be represented as an integer - hence the "illegal primes" problem back in the late-1990s).

And the biggest, most important take-away from any undergraduate-level computer-science class on cryptography is never create your own cryptosystem!.

With that out of the way...

Provided that security is not a top-concern...

...and you're only concerned with preventing disclosure of incrementing integer Id values (e.g. so your visitors and users don't see how many database records you really have) then use a Hashids library: https://hashids.org/

In your code, construct a single Hashids object (I'd use a public static readonly field or property - or better yet: a singleton injectable service) and use the .Encode method to convert any integer int/Int32 value into a string value.

To convert the string value back to the original int/Int32, use the .Decode method.

As an aside, I don't like how the library is called "Hashids" when hashes are meant to be one-way functions - because the values are still reversible - albeit by using a secret "salt" value (why isn't it called a "key"?) it isn't really a hash, imo.


If security really matters...

Then you need to treat each integer value as a discrete block in a block cipher (not a stream-cipher, because each value needs to be encrypted and decrypted independently by itself).

For the purposes of practicality, you need to use a symmetric block cipher with a small block-size. Unfortunately many block ciphers with small block sizes aren't very good (TripleDES has a block size of 64-bits - but it's weak today), so let's stick with AES.

AES has a block-size of 128 bits (16 bytes) - that's the same as two Int64 integers concatenated with each other. Assuming you use base64url encoding on a 16-byte value then your output will be 22 characters long (as Base64 uses 6 bits per character). If you're comfortable with strings of this length then you're all set. The shortest URL-safe string you can generate from a 128-bit value is 21 (hardly an improvement at all) because Base-73 is the most you can safely use in a URL that will survive all modern URL-transmission systems (never automatically assume Unicode is supported anywhere when dealing with plaintext).

It is possible to adapt AES to generate smaller output block-sizes, but it won't work in this case because using techniques like CTR Mode mean that the generated output needs to include extra state information (IV, counter, etc) which will end-up taking up the same amount of space as was gained.

Here's the code:

Very important notes:

private static readonly Byte[] _key = new Byte[] { }. // Must be 128, 192 or 256 bits (16, 24, or 32 bytes) in length.

private static readonly Byte[] _iv = new Byte[8]; // You could use the default all-zeroes.

// Note that this method works with Int32 arguments.
private static Byte[] ProcessBlock( Byte[] inputBlock, Boolean encrypt )
{
    Byte[] outputBlock;

    using( Aes aes = Aes.Create() )
    {
        aes.Key = _key;
        aes.IV  = _iv;

        using( ICryptoTransform xform = encrypt ? aes.CreateEncryptor() : aes.CreateDecryptor() )
        {
            outputBlock = xform.TransformFinalBlock( inputBlock, 0, inputBlock.Length );
        }
    }
}

public static Byte[] EncryptInteger( Int64 value )
{
    Byte[] inputBlock = new Byte[16];
    inputBlock[0] = (Byte)(value >>  0 & 0xFF);
    inputBlock[1] = (Byte)(value >>  8 & 0xFF);
    inputBlock[2] = (Byte)(value >> 16 & 0xFF);
    inputBlock[3] = (Byte)(value >> 24 & 0xFF);
    inputBlock[4] = (Byte)(value >> 32 & 0xFF);
    inputBlock[5] = (Byte)(value >> 40 & 0xFF);
    inputBlock[6] = (Byte)(value >> 48 & 0xFF);
    inputBlock[7] = (Byte)(value >> 56 & 0xFF);

    return ProcessBlock( inputBlock, encrypt: true );
}

public static Int64 DecryptInteger( Byte[] block )
{
    Byte[] outputBlock = ProcessInteger( value, encrypt: false );

    return
        (Int64)outputBlock[0] <<  0 |
        (Int64)outputBlock[1] <<  8 |
        (Int64)outputBlock[2] << 16 |
        (Int64)outputBlock[3] << 24 |
        (Int64)outputBlock[4] << 32 |
        (Int64)outputBlock[5] << 40 |
        (Int64)outputBlock[6] << 48 |
        (Int64)outputBlock[7] << 56;
};

public static String EncryptIntegerToString( Int64 value ) => Convert.ToBase64String( EncryptInteger( value ) ).Replace( '+', '-' ).Replace( '/', '_' );

public static Int64 DecryptIntegerFromString( String base64Url )
{
    if( String.IsNullOrWhiteSpace( base64Url ) ) throw new ArgumentException( message: "Invalid string.", paramName: nameof(base64Url) );

    // Convert Base64Url to Base64:
    String base64 = base64Url.Replace( '-', '+' ).Replace( '_', '/' );

    Byte[] block = Convert.FromBase64String( base64 );
    return DecryptInteger( block );
}
Dai
  • 141,631
  • 28
  • 261
  • 374
1

A simple method like this can produce a long sequence of numbers provided you get the constants right.

ulong Next(ulong current)
{
    unchecked
    {
        return (999_999_937L * current + 383_565_383L) % 999_999_999L;
    }
};

From memory, this kind of function can produce a sequence of 999_999_999 digits if the values in the function are chosen correctly.

My test code shows that this method can produce 500_499 numbers without repeating.

My computer can produce the entire sequence in just under 9 milliseconds so it is a fairly fast algorithm.

The first ten elements of this sequence (with leading '0's padded) is:

383565383, 602511613, 027845340, 657154301, 639998680, 703647183, 757439993, 422285770, 201847617, 869013116


5_960_464 * current + 383_565_383L gives a sequence length of 1_000_998 before repetition.

Enigmativity
  • 113,464
  • 11
  • 89
  • 172
  • 1
    But how do you then decide the numbers back? – Dai Jun 18 '20 at 07:02
  • @Dai - What do you mean? I don't understand what you're asking? – Enigmativity Jun 18 '20 at 07:29
  • The OP was asking tow to convert from one to the other and back again (i.e. to encode and to decode). So if I have an output value from your `Next` function, how can I get the original `current` value that it's derived from? – Dai Jun 18 '20 at 07:33
  • @Dai - That's easy with brute force. I can run through all 500,000 numbers in 9ms. It would be trivial to run through to find the input. – Enigmativity Jun 19 '20 at 02:54
  • But that approach won't scale - that's adding 9ms at 100% CPU utilization for every single request. That's unacceptably heavy for a production workload. – Dai Jun 19 '20 at 02:55
  • @Dai - It's 100% on one core only. And it's a once off calculation per session. – Enigmativity Jun 19 '20 at 03:07
  • 1
    It wouldn't be once-per-session, it would be once-per-**request**. In cloud and shared-hosting scenarios hogging a whole CPU thread for 9ms would result in the sysadmin writing you a strongly worded email. What this code does is essentially the same thing as mining bitcoin - except without earning anything from doing so. – Dai Jun 19 '20 at 03:11
  • @Dai - I'd say that the bitcoin mining example is a bit of a stretch. You could argue that all code does essentially the same thing as bitcoin mining to be honest. – Enigmativity Jun 19 '20 at 05:42