11

I am using UUIDs, but they are not particularly nice to read, write and communicate. So I would like to encode them. I could use base64, or base32, but they would not be easy anyway: base64 has capitalized letters and symbols. Base32 is a bit better, but you can still obtain clumsy stuff.

I was wondering if there's a nice and clean way to encode a number into palatable phonemes, so to achieve better readability and hopefully a bit of compression.

Kara
  • 6,115
  • 16
  • 50
  • 57
Stefano Borini
  • 138,652
  • 96
  • 297
  • 431
  • Are you looking for a way to make uuids memorable (as in pronounceable passwords) or just an effective way to, for example, read them to someone over the phone? – Dale Hagglund Oct 30 '09 at 07:27
  • read them over the phone and talk about them easily. I could implement a lookup strategy as well (like url shorteners), but before doing that I want to learn a bit more about the subject. – Stefano Borini Oct 30 '09 at 09:36

10 Answers10

12

I hope you don't use this idea: The Automated Curse Generator :)

Michał Niklas
  • 53,067
  • 18
  • 70
  • 114
  • 1
    This is fantastic. I cannot really say you solved my question, but definitely you provided an interesting point of view. +1 – Stefano Borini Oct 30 '09 at 06:06
10

Bubble Babble is a good one to try. It generates nonsensical but readable output like:

xesef-disof-gytuf-katof-movif-baxux
5

This question is very old; interestingly, as old as the solution I'm about to present, but it hasn't been mentioned here yet.

It's Proquint. Similar to Bubble Babble, but the differences make the results easier to read, in my opinion.

Here's how it works, from their documentation:

In sum, we propose encoding a 16-bit string as a proquint [PRO-nouncable QUINT-uplet] of alternating consonants and vowels as follows.

Four-bits as a consonant:

0 1 2 3 4 5 6 7 8 9 A B C D E F
b d f g h j k l m n p r s t v z

Two-bits as a vowel:

0 1 2 3
a i o u

Whole 16-bit word, where "con" = consonant, "vo" = vowel:

 0 1 2 3 4 5 6 7 8 9 A B C D E F
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|con    |vo |con    |vo |con    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Separate proquints using dashes, which can go un-pronounced or be pronounced "eh". The suggested optional magic number prefix to a sequence of proquints is "0q-".

Here are some IP dotted-quads and their corresponding proquints.

127.0.0.1       lusab-babad
63.84.220.193   gutih-tugad
63.118.7.35     gutuk-bisog
140.98.193.141  mudof-sakat
64.255.6.200    haguz-biram
128.30.52.45    mabiv-gibot
147.67.119.2    natag-lisaf
212.58.253.68   tibup-zujah
216.35.68.215   tobog-higil
216.68.232.21   todah-vobij
198.81.129.136  sinid-makam
12.110.110.204  budov-kuras
Zwyx
  • 349
  • 4
  • 7
4

Why not use something similar to what PGP does to create readable keys, simply find a nice list of words that are distinctive, lets say you're using 128 bit UUID's, a list of 256 words (2^8) means 16 words.

Stupid question but why are people reading/writing UUID's/etc. with respect to your application?

  • I need to generate unique ids, because I am going to perform merging in the future. However, the objects I create are identified by URIs what contain the uuid. Of course I can assign more meaningful names, but I cannot expect every object I create to have a meaningful name. Still, I'd like to have something that can be spelled out. – Stefano Borini Oct 30 '09 at 06:05
  • Your idea is interesting. I think that using full words is a bit overkill, but I like it. Looking for something shorter though. – Stefano Borini Oct 30 '09 at 06:11
  • Then I would just go with hex encoding, 0-9, a-f, most people can read/pronounce those without to much trouble. –  Oct 30 '09 at 08:39
3

If all you want is a way to communicate hex values readably (ie, over the phone, or when instructing someone verbally what to type), then I suggest you use one of the various phonetic alphabets, such as the NATO Phonetic Alphabet or the US Army/Navy Phonetic Alphabet.

In the latter, the letters A-F are spoken as "able", "baker", "charlie", "dog", "easy", and "fox", respectively, so you would read the hex sequence "3fd2cc0e" as "three fox dog two charlie charlie zero easy". A uuid would be read out in exactly the same fashion.

Dale Hagglund
  • 16,074
  • 4
  • 30
  • 37
2

Bubble babble and base32 are inefficient, especially in your case. I suggest that you make your own algorithm. Since there are 20 consonants and 6 vowels (including 'y') you can have approx. 20*6*2+6*6=276 consonant/vowel-vowel/consonant pairs. So every byte of your number can be represented by a pair. With a bit of tweaking your algorithm could produce pronounceable words much shorter than bubble babble. You could even play dice and replace all odd digits with a consonant/vowel. For example, 0123456789ABCDEF (hex) encodes to ABECIDOFUGYHKRM. 3141592654 (dec) encodes to HHIA-ROIR. You are left with ten spare consonants which can be paired with vowels to replace some double consonants etc.

Hexacon
  • 21
  • 2
1

S/KEY uses a dictionary of 2048 words to map 64 bit numbers to a sequence of 6 predefined words/syllables. (People will always find swear words if they are looking for them ;) )

Community
  • 1
  • 1
devio
  • 36,858
  • 7
  • 80
  • 143
1

Urbit's phonetic naming system wasn't mentioned yet. It uses 3 characters for 8 bits, 6 for 16, so it's less efficient than Proquint or Bubble Babble, but more divisible.

ecloud
  • 326
  • 2
  • 3
0

and hopefully a bit of compression

Not sure exactly what you mean there; making something "readable" or "pronouncable" will inevitably expand the space required for it. Maybe you meant "hopefully a bit of redundancy"? It would be good if, even if the user makes a small mistake, the system can detect and perhaps even correct it.

Really it depends very much on how big your UUIDs are and how they are most often communicated. If they need to be communicated over phone or VoIP, you want more audible redundancy. If they need to be entered into mobile devices with numeric keypads, it tends to be difficult to enter alphabetic characters, moreso if they are case-sensitive. If they are written down a lot, you need to worry about characters that look similar (O and 0 and o, for instance). If they need to be memorised, then probably strings of real words are the best (have a look at the PGP Word List).

However I think a great all-round solution is just using numberic digits. They're a lot harder to confuse with each other (both when spoken and written) than some alphabetic characters. Easy to enter on mobile devices, and people aren't too bad at memorising numbers.

And the length of the string is not too bad either. Let's compare base32 with base 10 (decimal). The length of a decimal string is log_10(32) times the length of the corresponding base32 string, or about 1.5 times as long. Ten characters of base32 correspond to 15 decimal digits.

Not much of a penalty, IMO, seeing as in base 32 it's easy to confuse C and T, or S, F and X (when spoken), and someone speaking with a foreign accent is more likely to cause trouble.

Artelius
  • 48,337
  • 13
  • 89
  • 105
  • 1
    What I mean is that, for example, the sequence from 00 to FF is in base 16. If you accept tokens like "wa" or "su" or "me", you have more flexibility and consequently it takes less space. For example, a UUID encoded in base64 takes only 22 characters, and 26 in base32. – Stefano Borini Oct 30 '09 at 06:49
  • So, you're looking for a reasonably space- or time-efficient (not necessarily for a computer, maybe for a person) means of representing UUIDs. I've revised my answer to further discuss why I think KISS (and use decimal) is often the best way to go. – Artelius Oct 30 '09 at 07:08
  • 1
    If you're using long strings of digits PLEASE put a dash in every 4 characters so that people can use their well trained short memory (credit card #'s, phone #'s) to read digits in groupings of 4. –  Oct 30 '09 at 08:41
-3

If they were easy to read they probably wouldn't be particularly unique.

Azeem.Butt
  • 5,855
  • 1
  • 26
  • 22
  • That's not true. A UUID is just a large number. It's how you encode it that makes the difference. – Stefano Borini Oct 30 '09 at 05:55
  • This seems like a reasonable statement: the set of pronounceable objects **is** probably less than the set of unique numbers. – pavium Oct 30 '09 at 06:12
  • If you know that it's not true then why haven't you solved your own problem? – Azeem.Butt Oct 30 '09 at 06:21
  • 1
    the set of characters from 0 to f is probably less than the set of unique numbers. Still you see uuid encoded in hex and I can guarantee you they are very unique. – Stefano Borini Oct 30 '09 at 06:22