3

My webapplication has a table in the database with an id column which will always be unique for each row. In addition to this I want to have another column called code that will have a 6 digit unique Alphanumeric code with numbers 0-9 and alphabets A-Z. Alphabets and number can be duplicate in a code. i.e. FFQ77J. I understand the uniqueness of this 6 digit alphanumeric code reduces over time as more rows are added but for now I am ok with this.

Requirement (update) - The code should be at least of length 6 - Each code should be Alphanumeric

So I want to generate this Alphanumeric code.

Question

What is a good way to do this?

  • Should I generate the code and after the generation, run a query to the database and check if it already exists, and if so then generate a new one? To ensure the uniqueness, does this piece of code need to be synchronized so that only one thread runs it?
  • Is there something built-in to the database that will let me do this?

For the generation I will be using something like this which I saw in this answer

char[] symbols = new char[36];
char[] buf;
    for (int idx = 0; idx < 10; ++idx)
        symbols[idx] = (char) ('0' + idx);
    for (int idx = 10; idx < 36; ++idx)
        symbols[idx] = (char) ('A' + idx - 10);
public String nextString()
{
    for (int idx = 0; idx < buf.length; ++idx)
        buf[idx] = symbols[random.nextInt(symbols.length)];
    return new String(buf);
}
Community
  • 1
  • 1
Anthony
  • 33,838
  • 42
  • 169
  • 278
  • Do the ids need to be random? It's more efficient to use a (non-random) counter instead - you'll guarantee uniqueness without having to check to see if the value already exists – Zim-Zam O'Pootertoot Jul 14 '13 at 22:48
  • I presume you mean code? They need to look somewhat random so they are not easily guessable. – Anthony Jul 14 '13 at 22:50
  • Why? Why not just compute it from the ID every time you need it? You're de-normalizing your database by doing this. – user207421 Jul 14 '13 at 23:13
  • So, you want it to be at least 6 chars long, but it could be longer? – Alan Jul 14 '13 at 23:22

7 Answers7

3

I would simply do this:

String s = Integer.toString(i, 36).toUpperCase();

Choosing base-36 will use characters 0-9a-z for the digits. To get a string that uses uppercase letters (as per your question) you would need to fold the result to upper case.

If you use an auto increment column for your id, set the next value to at least 60,466,176, which when rendered to base 36 is 100000 - always giving you a 6 digit number.

Bohemian
  • 412,405
  • 93
  • 575
  • 722
3

Since it's a requirement for the shortcode to not be guessable, you don't want to tie it to your uniqueID row ID. Otherwise that means your rowID needs to be random, in addition to unique. Starting with a counter 0, and incrementing, makes it pretty obvious when your codes are: 000001, 000002, 000003, and so forth.

For your short code, generate a random 32bit int, omit the sign and convert to base36. Make a call to your database, to ensure it's available.

You haven't explicitly called out scalability, but I think it's important to understand the limitations of your design wrt to scale.

At 2^31 possible 6 char base36 values, you will have collisions at ~65k rows (see Birthday Paradox questions)

From your comment, modify your code:

public String nextString()
{
    return Integer.toString(random.nextInt(),36);
}
Community
  • 1
  • 1
Alan
  • 45,915
  • 17
  • 113
  • 134
2

I would start with 0 for an empty table and do a

SELECT MAX(ID) FROM table

to find the largest id so far. Store it in an AtmoicInteger and convert it using toString

AtomicInteger counter = new AtomicInteger(maxSoFar);

String nextId = Integer.toString(counter.incrementAndGet(), 36);

or for padding. 36 ^^ 6 = 2176782336L

String nextId = Long.toString(2176782336L + counter.incrementAndGet(), 36).substring(1);

This will give you uniqueness and no duplicates to worry about. (it's not random either)

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • This would be ideal but in certain cases it will not give a code of length 6. For example if max ID is 3 then `Integer.toString(3, 36)` just returns `3`. – Anthony Jul 14 '13 at 23:01
  • 1
    You can add padding if you need it, added an example or you can start at 36 ^^ 5 – Peter Lawrey Jul 14 '13 at 23:03
  • 1
    This works, however there are some drawbacks. This app isn't going to be horizontally scalable, and the shortcode is easily guessable, since it's a monotonically increasing value. I know the OP didn't mention scalability, but in the comments, he did want the codes to be random. – Alan Jul 14 '13 at 23:08
  • I hate to change the question so much. But as I'm working through the problem I'm realizing more. I would want each code to be Alphanumeric. Converting to base 36 will sometimes give me 000003 etc. – Anthony Jul 14 '13 at 23:22
  • @Anthony 000003 is alpha numeric. Do you mean you want at least one alpha and at least one digit? – Peter Lawrey Jul 15 '13 at 07:18
  • If so, you can generate new ids, until one satisfies your criteria. – Peter Lawrey Jul 15 '13 at 07:19
1

Simply, you can use Integer.toString(int i, int radix). Since you have base 36(26 letters+10 digits) you set the radix to 36 and i to your integer. For example, to use 16501, do:

 String identifier=Integer.toString(16501, 36);

You can uppercase it with .toUpperCase()

Now onto your other questions, yes, you should query the database first to ensure it doesn't exist. If depending on the database, it may need to be synchronized, or it may not be as it'll use its own locking system. In any case, you'd need to tell us which database.

On the question of whether there's a builtin, we'd need to know the DB type as well.

nanofarad
  • 40,330
  • 4
  • 86
  • 117
1

To create a random but unique value within a small range here are some ideas I know of:

  1. Create a new random value and try to insert it.

    Let a database constraint catch violations. This column should also likely be indexed. The DML may need to be tried several times until a unique ID is found. This will lead to more collisions as time progresses, as noted (see the birthday problem).

  2. Create a "free IDs" table ahead of time and on usage mark the ID as being used (or delete it from the "free IDs" table). This is similar to #1 but shifts when the work is done.

    This allows the work of finding "free IDs" to be done at another time, perhaps during a cron job, so that there will not be a contraint violation during the insert keeping the insert itself the "same speed" throughout the usage of said domain. Make sure to use transactions.

  3. Create a 1-to-1/injective "mixer" function such that the output "appears random". The point is this function must be 1-to-1 to inherently avoid duplicates.

    This output number would then be "base 36 encoded" (which is also injective); but it would be guaranteed unique as long as the input (say, an auto-increment PK) was unique. This would likely be less random than the other approaches, but should still create a nice-looking non-linear output.

    A custom injective function can be created around an 8-bit lookup table fairly trivially - just process a byte at a time and shuffle the map appropriately. I really like this idea, but it can still lead to somewhat predictable output

To find free IDs, approaches #1 and #2 above can use "probing with IN" to minimize the number of SQL statements used. That is, generate a bunch of random values and query for them using IN (keeping in mind what sizes of IN your database likes) and then see which values were free (as having no results).

To create a unique ID not constained to such a small space, a GUID or even hashing (e.g. SHA1) might be useful. However, these only guarantee uniqueness because they have 126/160-bit spaces so that the chance of collision (for different input/time-space) is currently accepted as improbable.


I actually really like the idea of using an injective function. Bearing in mind that it is not good "random" output, consider this pseudo-code:

byte_map = [0..255]

map[0] = shuffle(byte_map, seed[0])
..
map[n] = shuffle(byte_map, seed[1])

output[0] = map[0][input[0]]
..
output[n] = map[n][input[n]]

output_str = base36_encode(output[0] .. output[n])

While a very simple setup, numbers like 0x200012 and 0x200054 will still share common output - e.g. 0x1942fe and 0x1942a9 - although the lines will be changed a bit due to the later application of the base-36 encoding. This could probably be further improved to "make it look more random".

user2246674
  • 7,621
  • 25
  • 28
0

For efficient usage, try caching generated code in a HashSet<String> in your application:

HashSet<String> codes = new HashSet<String>();

This way you don't have to make a db call every time to check whether the generated code is unique or not. All you have to do is:

codes.contains(newCode);

And, yes, you should synchronize your method which updates the cache

public synchronize String getCode ()
{
    String newCode = "";
    do {
        newCode = nextString();
    }
    while(codes.contains(newCode));
    codes.put(newCode);
}
0

You mentioned in your comments that the relationship between id and code should not be easily guessable. For this you basically need encryption; there are plenty of encryption programs and modules out there that will perform encryption for you, given a secret key that you initially generate. To employ this approach, I would recommend converting your id into ascii (i.e., representing as base-256, and then interpreting each base-256 digit as a character) and then running the encryption, and then converting the encrypted ascii (base-256) into base 36 so you get your alpha-numeric, and then using 6 randomly chosen locations in the base 36 representation to get your code. You can resolve collisions e.g. by just choosing the nearest unused 6-digit alpha-numeric code when a collision occurs, and noting the re-assigned alpha-numeric code for the id in a (code <-> id) table that you will have to maintain anyway since you cannot decrypt directly if you only store 6 base-36 digits of the encrypted id.

user2566092
  • 4,631
  • 15
  • 20