0

I want to generate a unique id which will be used in URLs to identify a specific resource. In principle it is the same as pastebin.com etc. does.

The id and resource is not very secret but I want it to be so you just can't decrement a id and then get another users resource. I´m thinking of a CHAR(8) which will look nice in a URL and still be large enough to reduce the chance of guesses. But how do I generate this? For an INT, you can use auto_increment and primary key to insure the uniqueness.

But if I do the following in order

  1. Generate a CHAR(8) in my application
  2. Insure that this ID doesn't exists.
  3. If it not exists, store, else goto 1.

I have to wrap 2. and 3. in a atomic transaction.

But is there a better way? or shouldn't I care about the the check (2.) because a clash doesn't occur regularly. I use MySql and .Net (C#) if that helps. Is it possible to somehow 'encrypt' a auto-incremented int as the text-id and decrypt it again in precisely 8 (or 10) characters.

I have read Need a smaller alternative to GUID for DB ID but still unique and random for URL which was useful, but the use of GUID is not supported in MySql (as far as I know). But a comment on the quality of the LongToShortCode method in the thread would also be appreciated.

Note: the resources can't be changed, only viewed.

Best regards, Lasse

Community
  • 1
  • 1
Lasse Espeholt
  • 17,622
  • 5
  • 63
  • 99
  • Its unclear if you are trying to generate this id in C# or in mysql directly. To avoid the possibility of duplication, you can prepend the database row id to the unique string, so that no 2 would be the same... yet it still maintains the inability to just decrement the id to get the previous entry. – GrandmasterB Aug 26 '10 at 18:31
  • @GrandmasterB It is not important where the ID is generated but I prefer it to be generated as close to the database as possible. "you can prepend the database row id to the unique string" - will you explain more in depth of what you mean? – Lasse Espeholt Aug 26 '10 at 18:41
  • If you have a random id, ABCDEFG, and want to make sure it's unique in the database, add the unique row id to it, 1234-ABCDEFG, where 1234 is the row id. Then, you dont have to worry about collisions. It will, though, require that you add the random id to the database after the row is created so that you could get said ID. Or, you can just output it to the user as 1234-ABCDEFG, even if in the database its row 1234 value ABCDEFG - just split it at the "-" before querying. – GrandmasterB Aug 26 '10 at 18:50
  • @GrandmasterB I guess there would be a problem accessing the resource again. There could be many rows with the same ending text id. – Lasse Espeholt Aug 26 '10 at 18:54
  • Not if the unique row id was attached to it as part of the text id. – GrandmasterB Aug 26 '10 at 19:26
  • Sure, but how would you retrieve the row `1234-ABCDEFG` with just `ABCDEFG`? There could also be a row called `2422-ABCDEFG`. – Lasse Espeholt Aug 26 '10 at 19:45

4 Answers4

1

MySql implements the UUID. Which appears to be a GUID with a different name. So that option is still available for you.

If you are still going to use char(8), then you do need to worry about uniqueness of your ID, simply because if you're looking at served URLs, you may not know a violation has occurred until people start reporting problems.

AllenG
  • 8,112
  • 29
  • 40
  • The UUID/GUID is too large, and is difficult the strip down (look the other stackoverflow thread). The resource can't be changed, only viewed, so that will not be a problem. In essence, I will provide something similar to e.g. pastebin.com – Lasse Espeholt Aug 26 '10 at 18:36
1

You could use an int identity and then encrypt/decrypt it before using it, probably not the best idea under heavy load though.

Paul Creasey
  • 28,321
  • 10
  • 54
  • 90
0

The easyish way of doing that pseudo-atomically is to

  1. generate the random string
  2. store the string (in effect reserving it)
  3. check if another one exists
  4. if another one exists, remove the one you just made, and return to step 1

Collisions can still happen, but when they do it causes both threads to try again, which in this case isn't a problem.

EDIT: I would suggest taking the first few characters of a cryptographic hash or something for your generation function, but it doesn't really matter.

zebediah49
  • 7,467
  • 1
  • 33
  • 50
0

I think I'll do it like this: A 8 character text id can store a number up to 64^8 = 2^48.

I will then use two columns:

  • ID, INT 2^32 auto-increment
  • Rand, INT 2^16

Then, when I add a row, I will generate a random 2^16 integer and put it in a new row. The text id is then simply generated from from the two numbers combined. And retrieval is easy two - just splitting it up and a simple lookup in the database. Ridiculous simple solution which should eliminate row clashes and be random enough (2^16) to reduce guesses.

Feedback on this approach will be appreciated.

Lasse Espeholt
  • 17,622
  • 5
  • 63
  • 99