1

I have a number of records in a table and need to find a way to assign some unique codes that

  • Should be unique;
  • Should consist of lowercase letters only (no digits) so it's easy for people to mention record codes over telephone using a phonetic alphabet eg - "Hi, this is John, my code is: alpha zulu whisky foxtrot";
  • Should be as short as possible.

Records have an expiration date of about 2 weeks, then they are deleted.

New records are created constantly at a rate of several thousand per day. The table gets a couple thousand selects per second.

Ideally this should happen in SQL only so I can add it to a "before insert" trigger and not hit the database with selects to check for uniqueness of application-side generated codes, which is what I'm playing with now and can come with significant overhead, I won't even post the code here.

Peter O.
  • 32,158
  • 14
  • 82
  • 96
Nick M
  • 2,424
  • 5
  • 34
  • 57
  • Could you tell us some about what these codes are used for? – Schwern Mar 18 '16 at 22:49
  • A possible duplicate of [mysql unique number generation](https://stackoverflow.com/questions/4382185/mysql-unique-number-generation). – Schwern Mar 18 '16 at 22:58
  • The codes are used for agents who reach to a call center to identify certain records. We don't want them to have to remember or write down digits or alphanumeric codes. We've ran tests, it's best when the codes are made of letters only and kept as short as possible. Part of this happens because the phone lines are often quite bad (spelling a 10 character alphanumeric code over a phone line with noise/delay => guaranteed annoyance and time waste). This question is not a duplicate, the first 20-25 or so search results, all other questions refer to uuids or random integers and so on. Thanks – Nick M Mar 19 '16 at 09:54
  • Random integers can be represented as a sequence of lowercase letters in base 26. – Schwern Mar 19 '16 at 18:05

1 Answers1

2

Turn the integer primary key into a base 26 number... aka letters.

0  becomes a
1  becomes b
25 becomes z
26 becomes aa
27 becomes ab

There's no need to store this representation in the database, store it as an integer and convert it to letters at the display level.

Because the primary key increments this isn't particularly secure, it can be guessed.


If you wish to have the code expire, but not the associated data, create a second table which stores the code, when it was created, and a foreign key back to the data. Use an INSERT trigger to add a code.

CREATE TABLE user_codes (
    code     INTEGER PRIMARY KEY AUTO_INCREMENT,
    uid      INTEGER FOREIGN KEY users(id),
    created  TIMESTAMP 
);

As above, turn that integer code into base 26 where the digits are all letters.

This also protects the user's primary key, someone might be able to do something malicious with that maybe, better to not leak it.

This also isn't particularly secure and can be guessed. You can improve on it by using the technique in this answer to pick unique random numbers.


If it must be random and unique, consider using a UUID_SHORT() call to generate a unique 64 bit number, which is basically the time plus a server ID. Then convert that to base 26 and there to letters. Unfortunately this will require something like 14 letters.

...but because it's based on time this also isn't particularly secure and can be guessed.


Finally, ask yourself if the code must be globally unique, or if it's enough for the person to know the code for a particular account. For example, a verification code.

Community
  • 1
  • 1
Schwern
  • 153,029
  • 25
  • 195
  • 336
  • Base64 representations include digits, we do not want digits, we want numbers only. In addition, if we convert our primary key to base 26 we will end up with some 6 characters which is more than we want, at any point in time we have around 60,000 to 100,000 records in our table, so we don't want to generate our codes based on the primary key as the codes would be too long. Keeping the code private is not a concern, there is separate business logic for that. uuid_short() returns too many characters. The code must be unique in the table, it is not associated to an account or anything similar. – Nick M Mar 19 '16 at 09:50
  • By the way "we have around 60,000 to 100,000 records in our table" means these are the searchable records. We do not expunge them from the table when expired, we just change a status field to zero and remove the code. Added this as clarification, in case you would suggest resetting the primary key every now and then to keep the IDs shorter... – Nick M Mar 19 '16 at 09:58
  • @NickM You're right about base64... which is why I never brought it up. As for base 26, that's the best you can do if you only want to use lowercase letters. To cover 100,000 entries you need at least 4 letters (`26^4`). [Use the technique here](http://stackoverflow.com/a/4382586/14660) to create a sufficiently large random number and then convert it to base 26 letters. – Schwern Mar 19 '16 at 18:08
  • Sorry, I meant base 26 not Base64... typo or autocomplete or both. Thanks. – Nick M Mar 19 '16 at 21:56
  • @NickM I believe you're thinking base 26 must be done like hex, 0-9 and then letters for the extra 16 digits. Instead, forgo the numbers and use just a-z for the digits. (I realize I left 0 off on my example and I've fixed it) – Schwern Mar 19 '16 at 22:22
  • I tried it, it can generate collisions if we remove the digits or replace them with letters. – Nick M Mar 20 '16 at 12:40
  • @NickM If you have no collisions with numbers, and if you're transcribing correctly, that shouldn't be the case. Can you give an example? – Schwern Mar 20 '16 at 18:20
  • I think I'll try with base 26 and improve this later depending on user feedback... at least this will sort out american and european users who will know how to dictate the digits properly over the phone :)) Thank you so much for this suggestion! – Nick M Mar 21 '16 at 14:14