22

I need to generate a random ID, alphanumeric, 6 characters, as the ID for shortlink service.

Currently, I generate a random 6-character code, look up in the db to see if it has been used before, and if it has, repeat the process. I need it to be unique for all 36^6 combinations. As the system grows, the worse its performance.

Is there known good approach that minimizes hitting the db, preserves state globally, and will not take more than 100ms to lookup?

Thx for any help

metalaureate
  • 7,572
  • 9
  • 54
  • 93
  • What is the set of characters you want to draw from? – lurker Oct 05 '13 at 23:49
  • Are you limited to 6 only? Or would a 22 like "Xy0MVKupFES9NpmZ9TiHcw" will do? – Yuriy Galanter Oct 06 '13 at 00:00
  • 11
    Serious question, why do you have to have *random* IDs for a shortlink service? Generate sequential numerical IDs and convert them to base 36 using the digit set of your choice. – us2012 Oct 06 '13 at 00:04
  • I'd use Base64 (being careful to choose a length that wouldn't be padded). – Hot Licks Oct 06 '13 at 00:21
  • @HotLicks Base64ing a URL doesn't really shorten it though, so I don't quite see how that would make a good shortlink service. – us2012 Oct 06 '13 at 00:24
  • 1
    Base64 is standard, and more compact than base 36. 6 characters will get you 32 bits binary, which should be plenty for a "non-repeating" identifier. – Hot Licks Oct 06 '13 at 00:31
  • @HotLicks Oh, I see. You are suggesting base64 instead of a 36-letter alphabet, *not* as the single step of mangling the URL. Yeah, that sounds sensible. – us2012 Oct 06 '13 at 00:38
  • Maybe you want to consider bytes 17-23 of GUID. See this: http://jeffhandley.com/archive/2009/07/09/190.aspx – NoChance Oct 06 '13 at 00:44
  • It would probably be best to use `-` and `_` for the "extra" characters, vs `+` and `.`. And if he wants to obscure the sequentialness he can run the original number (before encoding) through encryption. – Hot Licks Oct 06 '13 at 00:45
  • 1
    @EmmadKareem - You're not going to get many different IDs out of that. – Hot Licks Oct 06 '13 at 00:46
  • @HotLicks, as a mater of fact, I made a quick program and ran it through. I got about 15000 unique values (an average of 3 runs). This is not great but it not bad given the ease of coding. – NoChance Oct 06 '13 at 03:04
  • Do you care about users being able to see how many ids you're generating each day? – Paul Hankin Oct 06 '13 at 06:40

2 Answers2

24

Use sequential numbers, so you will never have to hit the database searching for whether a key already exists. You just need to keep track of the maximum number assigned so far.

To make this sequential ID alphanumeric, convert it to base 36.

If you don't like the fact that the first IDs assigned will then look like 000000, 000001, ... 00000z, 000010,... , you can make use of a mathematical trick: Internally keep the sequential, numeric ID, but for the external representation visible to the user, multiply it (mod 36^6) by a large prime smaller than 36^6-1 - 1679979167 would be a decent choice - before converting to base 36. This will make your IDs appear random to the user even though they really aren't.

Here's a python sample with output:

def baseN(num,b,numerals="0123456789abcdefghijklmnopqrstuvwxyz"):
    return ((num == 0) and numerals[0]) or (baseN(num // b, b, numerals).lstrip(numerals[0]) + numerals[num % b])

for internalID in range(1,200):
    mangled = (internalID*1679979167)%(36**6)
    print internalID, mangled, baseN(mangled,36)

(baseN code is jellyfishtree's from here)


1 1679979167 rs7s7z
2 1183175998 jkfkfy
3 686372829 bcncnx
4 189569660 34v4vw
5 1869548827 ux2x3v
6 1372745658 mpapbu
7 875942489 ehihjt
8 379139320 69q9rs
9 2059118487 y1y1zr
10 1562315318 pu5u7q
11 1065512149 hmdmfp
12 568708980 9eleno
Community
  • 1
  • 1
us2012
  • 16,083
  • 3
  • 46
  • 62
  • (Trying to figure out the prime number trick.) So the idea is... you've made a reversible mapping from the internal integer to a representation that the user will see? I'm wondering what, then, the connected system will see... – Dogweather Oct 06 '13 at 00:27
  • Yeah, I want a reversible mapping from the range [0,36^6-1] to itself that will look 'sequential' numbers look not sequential to the untrained eye. The internal ID would just be the sequential number while everything outside, customers, users, etc would just see the mapped version. – us2012 Oct 06 '13 at 00:30
  • (Edited and fixed a problem - earlier version of code forgot to take result mod 36^6.) – us2012 Oct 06 '13 at 00:56
  • This is also a kick ass solution. Thank you! – metalaureate Oct 06 '13 at 14:16
  • 3
    How do you ensure that the generated numbers are unique? Since you are doing a mod, the numbers may start colliding. – inder Apr 22 '16 at 06:00
  • if for some reason you want them to appear non-sequential, but also guarantee that there are no collisions, consider using encryption (like AES) instead of hashing – Display Name Sep 05 '16 at 05:34
12

This problem has a well-known solution:

http://en.wikipedia.org/wiki/Linear_congruential_generator

Just generate the next random number with your LCG, then convert it to base 36 and write the corresponding pseudorandom string.

Good luck!

naitoon
  • 665
  • 5
  • 14
  • 1
    Yeah, this might actually be the best idea. My idea is somewhat overcomplicating it because I erroneously assumed that you need a quickly computable inverse from the mangled form to the sequential ID, but rethinking it, I don't think that's necessary. So +1 to you. – us2012 Oct 06 '13 at 00:32
  • It is somewhat like an LCG, only that it doesn't *iterate* the steps but multiplies with sequential IDs, which makes it easily reversible. – us2012 Oct 06 '13 at 00:37
  • Yes, it's definitely not the same thing, but primes are also involved. – naitoon Oct 06 '13 at 00:52
  • Thanks all - the quality and number of responses to my question was humbling. Thank you all so much! I'm going to go with LCG. – metalaureate Oct 06 '13 at 03:32