How are Urbit phonetic names encoded?

Question

Urbit points (network addresses) are identified by 32-bit integers, but they're typically not referred to by their number. Instead, I usually see them represented in a human-pronounceable form where every byte is converted into a three-letter syllable. For example:

  8 bits  galaxy  ~lyt
 16 bits  star    ~diglyt
 32 bits  planet  ~picder-ragsyt
 64 bits  moon    ~diglyt-diglyt-picder-ragsyt
128 bits  comet   ~racmus-mollen-fallyt-linpex--watres-sibbur-modlux-rinmex

I initially assumed that every byte had a single text representation, but have seen that planets names usually don't include the name of their star, so it must be more complicated than that.

How does Urbit's phonetic name encoding system (@p-names) work?

Jeremy · Accepted Answer · 2019-04-11T02:00:50.840

Urbit's phonetic naming system encodes unsigned integers as human-readable strings. These unsigned integers sometimes represent the byte strings they encode to in big-endian (although that representation can't track leading zeros so the byte length must communicated out-of-band if needed). The phonetic naming scheme operates on these big-endian bytes.

The phonetic naming system has two variants. For general use there is @q-encoding, which is suitable for values of any length, and is frequently used to represent binary data in Hoon code or when interacting with the Dojo REPL. For Urbit point names there is @p-encoding, which is based on @q-encoding but modifies certain cases.

`@q`-Encoding: Pairs of Syllables

Urbit phonetic names are made up of 3-letter syllables, organized in two lists of 256 syllables each. Each syllable consists of a consonant, a vowel, then another consonant. The "prefix" syllable list uses the vowels a, i, and o, and the "suffix" syllable list uses the vowels e, u, and y, with one exception: zod, the first entry in the suffix list. The full syllable lists are included below.

Values fitting in one byte, from 0x00 to 0xFF, are encoded by taking the corresponding syllable from the suffix list. Examples: 0x00 becomes ~zod, 0x01 becomes ~nec.

Values fitting in two bytes, from 0x0100 to 0xFFFF, are encoded by looking up the syllable corresponding to the high byte in the prefix list and concatenating the syllable corresponding to the low byte in the suffix list. Examples: 0x0100 becomes ~marzod, 0x0101 becomes ~marnec.

Larger values are encoded by splitting them into two-byte pairs in big-endian order, encoding each as described above for values fitting in two bytes, and joining the results with - hyphen/minus characters. If the value is an odd number of bytes, the first byte pair is padded with a leading zero. Examples: 0x01_0000 becomes ~doznec-dozzod, 0x0101_0101 becomes ~marnec-marnec.

`@p`-Encoding: Scrambling Planets

The @p-encoding scheme is the same as @q for most values. However, it is different for values between 17 and 64 bits, which correspond to the IDs of planets and moons.

Planets are intended to correspond to real individuals on the Urbit network. Each planet is spawned from a star, and the 16 lower bits of the planet's ID are those of its parent star's ID. Under the @q-encoding system, this would also mean that the last two syllables of every planet's name would be its star's name. The Urbit developers didn't want each individual's name on the network to include the name of the star that happened to spawn their planet initially: that would artificially associate them with the star forever, even though they could immediately transfer their planet to a different star.

Their solution was to scramble all planet names randomly, to obfuscate the relationship between a planet's name and its parent star's name. This is implemented as a custom (obviously non-secure) cipher over the space of possible planet IDs. Because each star has 2¹⁶ - 1 planets, the number of planets is not a power of two, so a conventional block cipher won't work directly. Instead, they use the construction described in Ciphers with Arbitrary Finite Domains (Black and Rockway 2002) over a custom Feistel-style block cipher optimized for speed (and compatibility).

This scrambling is applied on planet IDs, and on the lower 32 bits of a moon ID (which correspond to its parent planet's ID). Under @p-encoding, the planet with ID 0x01_0101 becomes ~ralnyt-botdyt, showing no connections to its parent star ~marnec. The star-planet relationship is the only one that is obfuscated. If you look at the names of a planet's moons, they include the name of the planet directly: for example, ~ralnyt-botdyt's moon 0x01_0001_0101 becomes ~doznec-ralnyt-botdyt, and 0x02_0001_0101 becomes ~dozbud-ralnyt-botdyt.

Implementations

When writing Hoon code, such as at the Dojo REPL, you can use the standard @p and @q functions directly to encode values to the corresponding phonetic names. In Hoon, a @p-encoded value is identified with the prefix ~ and a @q-encoded value is identified with the prefix .~, and either can be decoded back with the @u function. Hoon also uses . the period character as a (mandatory) thousands separator in integer literals.

> `@p`1.529.729.032
~diglyt-diglyt
> `@q`1.529.729.032
.~fonbyn-mopful      
> `@u`~diglyt-diglyt
1.529.729.032
> `@u`.~diglyt-diglyt
3.246.440.832

In JavaScript, the official urbit-ob package provides similar functions.

import ob from "urbit-ob";
ob.patp(1529729032);           // ~diglyt-diglyt
ob.patq(1529729032);           // ~fonbyn-mopful
ob.patp2dec("~diglyt-diglyt"); // 1529729032
ob.patq2dec("~diglyt-diglyt"); // 3246440832

Full Syllable Lists

prefixes = ["doz","mar","bin","wan","sam","lit","sig","hid","fid","lis","sog",
"dir","wac","sab","wis","sib","rig","sol","dop","mod","fog","lid","hop","dar",
"dor","lor","hod","fol","rin","tog","sil","mir","hol","pas","lac","rov","liv",
"dal","sat","lib","tab","han","tic","pid","tor","bol","fos","dot","los","dil",
"for","pil","ram","tir","win","tad","bic","dif","roc","wid","bis","das","mid",
"lop","ril","nar","dap","mol","san","loc","nov","sit","nid","tip","sic","rop",
"wit","nat","pan","min","rit","pod","mot","tam","tol","sav","pos","nap","nop",
"som","fin","fon","ban","mor","wor","sip","ron","nor","bot","wic","soc","wat",
"dol","mag","pic","dav","bid","bal","tim","tas","mal","lig","siv","tag","pad",
"sal","div","dac","tan","sid","fab","tar","mon","ran","nis","wol","mis","pal",
"las","dis","map","rab","tob","rol","lat","lon","nod","nav","fig","nom","nib",
"pag","sop","ral","bil","had","doc","rid","moc","pac","rav","rip","fal","tod",
"til","tin","hap","mic","fan","pat","tac","lab","mog","sim","son","pin","lom",
"ric","tap","fir","has","bos","bat","poc","hac","tid","hav","sap","lin","dib",
"hos","dab","bit","bar","rac","par","lod","dos","bor","toc","hil","mac","tom",
"dig","fil","fas","mit","hob","har","mig","hin","rad","mas","hal","rag","lag",
"fad","top","mop","hab","nil","nos","mil","fop","fam","dat","nol","din","hat",
"nac","ris","fot","rib","hoc","nim","lar","fit","wal","rap","sar","nal","mos",
"lan","don","dan","lad","dov","riv","bac","pol","lap","tal","pit","nam","bon",
"ros","ton","fod","pon","sov","noc","sor","lav","mat","mip","fip"]

suffixes = ["zod","nec","bud","wes","sev","per","sut","let","ful","pen","syt",
"dur","wep","ser","wyl","sun","ryp","syx","dyr","nup","heb","peg","lup","dep",
"dys","put","lug","hec","ryt","tyv","syd","nex","lun","mep","lut","sep","pes",
"del","sul","ped","tem","led","tul","met","wen","byn","hex","feb","pyl","dul",
"het","mev","rut","tyl","wyd","tep","bes","dex","sef","wyc","bur","der","nep",
"pur","rys","reb","den","nut","sub","pet","rul","syn","reg","tyd","sup","sem",
"wyn","rec","meg","net","sec","mul","nym","tev","web","sum","mut","nyx","rex",
"teb","fus","hep","ben","mus","wyx","sym","sel","ruc","dec","wex","syr","wet",
"dyl","myn","mes","det","bet","bel","tux","tug","myr","pel","syp","ter","meb",
"set","dut","deg","tex","sur","fel","tud","nux","rux","ren","wyt","nub","med",
"lyt","dus","neb","rum","tyn","seg","lyx","pun","res","red","fun","rev","ref",
"mec","ted","rus","bex","leb","dux","ryn","num","pyx","ryg","ryx","fep","tyr",
"tus","tyc","leg","nem","fer","mer","ten","lus","nus","syl","tec","mex","pub",
"rym","tuc","fyl","lep","deb","ber","mug","hut","tun","byl","sud","pem","dev",
"lur","def","bus","bep","run","mel","pex","dyt","byt","typ","lev","myl","wed",
"duc","fur","fex","nul","luc","len","ner","lex","rup","ned","lec","ryd","lyd",
"fen","wel","nyd","hus","rel","rud","nes","hes","fet","des","ret","dun","ler",
"nyr","seb","hul","ryl","lud","rem","lys","fyn","wer","ryc","sug","nys","nyl",
"lyn","dyn","dem","lux","fed","sed","bec","mun","lyr","tes","mud","nyt","byr",
"sen","weg","fyr","mur","tel","rep","teg","pec","nel","nev","fes"]

How are Urbit phonetic names encoded?

1 Answers1

`@q`-Encoding: Pairs of Syllables

`@p`-Encoding: Scrambling Planets

Implementations

Full Syllable Lists

Linked

How are Urbit phonetic names encoded?

1 Answers1

@q-Encoding: Pairs of Syllables

@p-Encoding: Scrambling Planets

Implementations

Full Syllable Lists

Linked

`@q`-Encoding: Pairs of Syllables

`@p`-Encoding: Scrambling Planets