4

In our DB, every Person has an ID, which is the DB generated, auto-incremented integer. Now, we want to generate a more user-friendly alpha-numeric ID which can be publicly exposed. Something like the Passport number. We obviously don't want to expose the DB ID to the users. For the purpose of this question, I will call what we need to generate, the UID.

Note: The UID is not meant to replace the DB ID. You can think of the UID as a prettier version of the DB ID, which we can give out to the users.

  • I was wondering if this UID can be a function of the DB ID. That is, we should be able to re-generate the same UID for a given DB ID.
  • Obviously, the function will take a 'salt' or key, in addition to the DB ID.
  • The UID should not be sequential. That is, two neighboring DB IDs should generate visually different-looking UIDs.
  • It is not strictly required for the UID to be irreversible. That is, it is okay if somebody studies the UID for a few days and is able to reverse-engineer and find the DB ID. I don't think it will do us any harm.
  • The UID should contain only A-Z (uppercase only) and 0-9. Nothing else. And it should not contain characters which can be confused with other alphabets or digits, like 0 and O, l and 1 and so on. I guess Crockford's Base32 encoding takes care of this.
  • The UID should be of a fixed length (10 characters), regardless of the size of the DB ID. We could pad the UID with some constant string, to bring it to the required fixed length. The DB ID could grow to any size. So, the algorithm should not have any such input limitations.

I think the way to go about this is:

Step 1: Hashing.

I have read about the following hash functions:

The hash returns a long string. I read here about something called XOR folding to bring the string down to a shorter length. But I couldn't find much info about that.

Step 2: Encoding.

I read about the following encoding methods:

  • Crockford Base 32 Encoding
  • Z-Base32
  • Base36

I am guessing that the output of the encoding will be the UID string that I am looking for.

Step 3: Working around collisions.

  • To work around collisions, I was wondering if I could generate a random key at the time of UID generation and use this random key in the function.
  • I can store this random key in a column, so that we know what key was used to generate that particular UID.
  • Before inserting a newly generated UID into the table, I would check for uniqueness and if the check fails, I can generate a new random key and use it to generate a new UID. This step can be repeated till a unique UID is found for a particular DB ID.

I would love to get some expert advice on whether I am going along the correct lines and how I go about actually implementing this.

I am going to be implementing this in a Ruby On Rails app. So, please take that into consideration in your suggestions.

Thanks.

Update

The comments and answer made me re-think and question one of the requirements I had: the need for us to be able to regenerate the UID for a user after assigning it once. I guess I was just trying to be safe, in the case where we lose a user's UID and we will able to get it back if it is a function of an existing property of the user. But we can get around that problem just by using backups, I guess.

So, if I remove that requirement, the UID then essentially becomes a totally random 10 character alphanumeric string. I am adding an answer containing my proposed plan of implementation. If somebody else comes with a better plan, I'll mark that as the answer.

Community
  • 1
  • 1
Anjan
  • 1,613
  • 1
  • 19
  • 25
  • create a number based on a year, month, day (and even time if you need that) and followup YYYYMMDD/NN? Where NN could also be a alpha/numeric – Roger Mar 07 '12 at 12:20
  • Are you suggesting that I use just a timestamp + an alphanumeric constant as the UID? That is a little too obvious for my liking. I don't want the users to be able to guess the logic behind the IDs so easily. – Anjan Mar 08 '12 at 10:09
  • Well its not clear what info (assuming its a URL) is behind there. What about the way SO does it? A numberer + name (623581/anjan), that combo is hard to guess (not impossible). – Roger Mar 08 '12 at 10:15
  • As I mentioned in the question, this UID is analogous to a passport number. We have tens of thousands of users in the DB and I obviously don't want to use a name in the UID. Each user should be assigned this UID and this is what will be given to them, to identify themselves. These users are actually students who are enrolled in our DB by the school, so the students don't get to choose unique "usernames" like traditional signup systems. We have to auto-assign a unique ID. I am starting to lean towards pre-generating all possible combinations of 10 alphanumeric chars and picking them at random. – Anjan Mar 08 '12 at 10:40
  • I did read you question ;-) but sometimes thinking more outside the box shines a different light on the matter. – Roger Mar 08 '12 at 10:54
  • @Rogier - Thank you. You did get me re-thinking and questioning some of my requirements. I have updated the question accordingly. – Anjan Mar 09 '12 at 14:09

2 Answers2

2

As I mentioned in the update to the question, I think what we are going to do is:

  • Pre-generate a sufficiently large number of random and unique ten character alphanumeric strings. No hashing or encoding.
  • Store them in a table in a random order.
  • When creating a user, pick the first these strings and assign it to the user.
  • Delete this picked ID from the pool of IDs after assigning it to a user.
  • When the pool reduces to a low number, replenish the pool with new strings, with uniqueness checks, obviously. This can be done in a Delayed Job, initiated by an observer.
  • The reason for pre-generating is that we are offloading all the expensive uniqueness checking to a one-time pre-generation operation.
  • When picking an ID from this pool for a new user, uniqueness is guaranteed. So, the operation of creating user (which is very frequent) becomes fast.
Anjan
  • 1,613
  • 1
  • 19
  • 25
0

Would db_id.chr work for you? It would take the integers and generate a character string from them. You could then append their initials or last name or whatever to it. Example:

user = {:id => 123456, :f_name => "Scott", :l_name => "Shea"}
(user.id.to_s.split(//).map {|x| (x.to_i + 64).chr}).join.downcase + user.l_name.downcase

#result = "abcdefshea"
ScottJShea
  • 7,041
  • 11
  • 44
  • 67
  • 1
    Thank you for your idea. I don't want to use name because it can be changed by the user. Moreover: `(123456.to_s.split(//).map {|x| (x.to_i + 64).chr}).join.downcase` and `(123457.to_s.split(//).map {|x| (x.to_i + 64).chr}).join.downcase` result in `abcdef` and `abcdeg`. This is too similar to each other; sequential. As I said in my question: two neighboring DB IDs should generate visually different-looking UIDs. – Anjan Mar 08 '12 at 10:01