18

We use UUIDs for our primary keys in our db (generated by php, stored in mysql). The problem is that when someone wants to edit something or view their profile, they have this huge, scary, ugly uuid string at the end of the url. (edit?id=.....)

Would it be safe (read: still unique) if we only used the first 8 characters, everything before the first hyphen?

If it is NOT safe, is there some way to translate it into something else shorter for use in the url that could be translated back into the hex to use as a lookup? I know that I can base64 encode it to bring it down to 22 characters, but is there something even shorter?

EDIT I have read this question and it said to use base64. again, anything shorter?

Community
  • 1
  • 1
helloandre
  • 10,541
  • 8
  • 47
  • 64
  • Are you hoping that all but the first 8 characters of the UUID are superfluous? They're there to make it unique. Either way, if it's 8 instead of 22 characters, you really think that will make a friendlier user experience? I wouldn't spend time worrying about that. I've seen crazier URIs in the location bar and it certainly doesn't affect the site's usability. – webbiedave Dec 30 '10 at 16:10
  • i was thinking something similar to how an md5 hash can be shortened ( read this somewhere but can't remember where) because the characters have a uniform distribution for a certain substring. – helloandre Dec 30 '10 at 16:19
  • I see. Crypto hash functions are expected to have acceptable levels of collision occurrence and truncating the output string merely increases that chance. Your situation cannot afford any chance of "collision" (one id pointing to numerous records). – webbiedave Dec 30 '10 at 16:31

6 Answers6

17

Shortening the UUID increases the probability of a collision. You can do it, but it's a bad idea. Using only 8 characters means just 4 bytes of data, so you'd expect a collision once you have about 2^16 IDs - far from ideal.

Your best option is to take the raw bytes of the UUID (not the hex representation) and encode it using base64. Or, just don't worry much, because I seriously doubt your users care what's in the URL.

Nick Johnson
  • 100,655
  • 16
  • 128
  • 198
10

Don't cut a single bit out of that UUID: You have no control over the algorithm that produced it, there are multiple possible implementation, algorithm implementation is subject to change (example: changed with the version of PHP you're using)

If you ask me an UUID in the address bar doesn't look scary or difficult at all, even a simple google search for "UUID" produces worst looking URL's, and everybody's used to looking at google URL's!

If you want nicer looking URL's, take a look at the address bar of this stackoverflow.com article. They're using the article ID followed by the title of the question. Only the ID part is relevant, everything else is there to make it easy on the eyes of readers (go ahead and try it, you can delete anything after the ID, you can replace it with junk - doesn't matter).

Cosmin Prund
  • 25,498
  • 2
  • 60
  • 104
3

It is not safe to truncate uuid's. Also, they are designed to be globally unique, so you aren't going to have luck shortening them. Your best bet is to either assign each user a unique number, or let users pick a custom (unique) string (like a username, or nick name) that can be decoded. So you could have edit?id=.... or edit?name=blah and you then decode name into the uuid in your script.

Zeki
  • 5,107
  • 1
  • 20
  • 27
  • the picking custom names would work, but it's not always editing a user profile. could be something a user creates with nothing enforcably unique (think photo album) other than the uuid. – helloandre Dec 30 '10 at 16:13
1

It depends on how you're generating the UUID - if you're using PHP's uniqid then it's the right-most digits that are more "unique". However, if you're going to truncate the data, then there's no real guarantee that it'll be unique anyway.

Irrespective, I'd say that this is a somewhat sub-optimal approach - is there no way you can use a unique (and ideally meaningful) textual reference string instead of an ID in the query string? (Hard to know without more knowledge of the problem domain, but it's always a better approach in my opinion, even if SEO, etc. isn't a factor.)

If you were using this approach, you could also let MySQL generate the unique IDs, which is probably a considerably more sane approach than attempting to handle this in PHP.

John Parker
  • 54,048
  • 11
  • 129
  • 129
1

If you're worried about scaring users with the UUID in the URL, why not write it out to a hidden form field instead?

John Rasch
  • 62,489
  • 19
  • 106
  • 139
  • i wanted to find some compromise between this (naked url) and having some visual cue in the url of what you were editing. might end up with this approach tho. – helloandre Dec 30 '10 at 16:12
0

Old question but I think that it should be mentioned that "short ids" are a common practice exactly to present more human-friendly codes and they're not meant to replace the full ids entirely. Also, this is valid for any identifier, be it a number, UUID, SHA, or whatever.

As other answers already mentioned, you should always use the full UUID as the de facto key for your records.

The implementation of short ids will vary greatly depending on your needs but two things are really common across implementations:

  1. The interfaces/systems dealing with short ids do not resolve ambiguity silently. (note that collisions can happen but not be ambiguous depending on the context)
  2. Users can choose which to use in a transparent fashion, the short id or the full id.

Here are some common implementations:

  • Generate two ids for each resource where one of the ids is an incremental integer. This is the approach I've seen the most as it is the simplest and dispenses the two points I've just mentioned as collisions will never occur and you only use the integer-based id in end-user interfaces.
  • Allow any short form of the original id but either return an error when the short id is ambiguous or return all matches. git commits are an example but they use SHA instead of UUID.
  • Use a fixed number of chars to extract a short id but increase the number of chars as they collide and use the length of the short id as information to resolve collisions. It's good if the context allows the invalidation of old records so you can also come back to shorter ids as they're freed up.
  • Use a fixed number of chars as a short id and combine it with context information to minimize ambiguity, like reducing the search scope only to resources a user is allowed to access. Use this approach with caution, if the number of valid resources keeps increasing up to a certain threshold (which depends on the number of chars of the short id) new records will start conflicting with old ones all the time. An example fit for this approach would be insurance records where you can expect that after some time the insurance will expire allowing the system to handle conflicts gracefully: enter the first N digits to search for active insurances but make a separate search requiring all the digits to search for all or inactive ones.

As a bonus, if you just want to display lettered codes instead of numbers you can encode number-based ids as codes with libs like hashids.

Rodrigo Oliveira
  • 1,452
  • 4
  • 19
  • 36