11

Standard UUIDs are long, and you can't select the whole thing by double clicking.

e.g. 123e4567-e89b-12d3-a456-426655440000

I like shorter IDs.

I like being able to double click an ID to select it.

My question is: are there any issues with encoding a standard ID into a 22(ish) character long base62 alphanumeric string?

e.g. 71jbvv7LfRKYp19gtRLtkn

EDIT: Added Context
Our needs are for general data storage in NoSQL data storage services such as DynamoDB. Collision should not happen, but my understanding is that collision risk with UUIDs is negligible. Standard UUIDs would suit our needs, so what I'm asking is... is there any difference, or extra risk or unforeseen issues with encoding in base62 that doesn't exist with standard UUIDs?

Thanks.

MariuszS
  • 30,646
  • 12
  • 114
  • 155
JeremyTM
  • 593
  • 6
  • 17
  • Depends what they are for, where you are storing them and if you care about collision. More context? – syncdk Mar 07 '17 at 22:11
  • I've added context above. – JeremyTM Mar 07 '17 at 23:01
  • 7
    Remember that a [UUID](https://en.wikipedia.org/wiki/Universally_unique_identifier) is a 128-bit value, ***not* a String**. We use strings to display a UUID to humans. The canonical textual representation is a [hexadecimal](https://en.wikipedia.org/wiki/Hexadecimal) with four hyphens interspersed, 32 + 4 = 36. But you can generate any textual representation you wish as long as both sender and receiver understand it and can obtain the 128-bits. And you can omit the hyphens, as mentioned by dabest1, as they merely make the hex more readable by humans and more recognizable as a UUID. – Basil Bourque Mar 08 '17 at 00:17
  • 2
    Also note that some databases such as [Postgres](https://www.postgresql.org) support UUID natively as a data type, efficiently storing only the 128 bits of the underlying value rather than a string. – Basil Bourque Mar 08 '17 at 00:26
  • Thank basil, this is the kind of information I was looking for :) – JeremyTM Mar 08 '17 at 01:21
  • Using the native 16byte data type (if available) has the further advantage, that searching has no "case-insensitivity" problem. – martinstoeckli Mar 15 '17 at 12:12
  • Interesting @martinstoeckli, I don't need to worry about case - at least I think I don't? What is an example of a case insensitivity problem? – JeremyTM Mar 16 '17 at 20:28
  • We are working with an application, which stores the 36 character representation of the Guid, and in some databases (Oracle) the default string comparison is case sensitive. This can make it difficult to write a correct Sql query. – martinstoeckli Mar 16 '17 at 21:07

3 Answers3

5

I think it's a good idea and I'm strongly considering it myself for my current project.

But only for external representation, not for internal storage.

Indeed, UUIDs are fundamentally just 128 bit integers, or an array of 16 bytes or 128 bits.

For efficient DB storage, they should be stored in their binary form (e.g. a BINARY(16) column in MySQL). It will save space (16 bytes vs 36 bytes for the usual text representation, or 22 bytes for Base62), and perform faster when querying or indexing (strings don’t sort as fast as numbers because they rely on collation rules).

The canonical representation is a hexadecimal encoding, with the 8-4-4-4-12 grouping, based on the semantic meaning of each group of bytes (meaning which we don't care about in most cases).

But it is just a convention, and not human-friendly at all. So I think a different encoding such as Base62 is totally acceptable, to be exposed where human interaction happens (e.g. in URLs), or for interfaces or storage system that are text-based anyway (HTTP APIs for example, or file storage in CSV/JSON/XML...).

Internally your application should use them in binary form. I don't know about PHP but Java for example has the java.util.UUID class.

For Java there's also a really nice library that makes conversion between raw UUID and Base62 text representation very easy:

https://github.com/Devskiller/friendly-id

More about UUIDs:

Pierre Henry
  • 16,658
  • 22
  • 85
  • 105
3

Base62 is not as standard as base-64, but then base-64 would have two extra symbols which may not allow selecting the whole thing by double clicking.

How about just removing the dashes (-)? That would make it shorter than original and it would be easily selectable by double clicking a mouse.
Example:
123e4567e89b12d3a456426655440000

Update:
There are two common encodings for base-64: [a-zA-Z0-9/+] and [a-zA-Z0-9_-]. If you go with the latter, then that resolves your selection issue.
On the other hand, I think base-62 is more widely used than I originally thought. Here is a nice blog on the topic of using base-62: http://blog.birdhouse.org/2010/10/24/base62-urls-django/

dabest1
  • 2,347
  • 6
  • 25
  • 25
  • Thanks for taking the time @dabest1 - It's still quite long. Let's say the goal was to be as short as possible, with the same collision negligibility of a standard UUID. – JeremyTM Mar 07 '17 at 23:14
3

Solution to your problem is frequently named as Url62, some projects are using this conventions. They are converting plain UUID to Base62 format.

If you are developing in Java, then take a look at FriendlyId project: https://github.com/Devskiller/friendly-id

More to read about this topic: https://medium.com/@huntie/representing-a-uuid-as-a-base-62-hash-id-for-short-pretty-urls-c30e66bf35f9

MariuszS
  • 30,646
  • 12
  • 114
  • 155