51

I want to know what the xn-- (domain) -66b.com means in a domain. For example, I bought diseñolatinoamericano.com with ñ.

And in mozilla it appears http://xn--diseolatinoamericano-66b.com/ also in Facebook I can't link anything.

Thanks!

Jepser Bernardino
  • 1,331
  • 1
  • 12
  • 22
  • 1
    Google search "ACE prefix" – wim Jun 11 '15 at 03:39
  • That wasn't very helpful. I get very irrelevant results. – Knogobert Dec 19 '20 at 17:41
  • If you like more information about xn-- have a look at [Is there any way to avoid showing "xn--" for IDN domains?](https://stackoverflow.com/questions/11008602/is-there-any-way-to-avoid-showing-xn-for-idn-domains) – Flimtix Mar 02 '22 at 16:10

3 Answers3

63

Its the result of IDNA encoding; i.e. converting your unicode domain name to its ASCII equivalent which has to be done as DNS is not unicode-aware.

The xn-- says "everything that follows is encoded-unicode".

Alex K.
  • 171,639
  • 30
  • 264
  • 288
  • 8
    FYI, xn stands for _eXtended Names_. – noddy Jan 09 '20 at 01:38
  • 9
    @noddy That is absolutely NOT true. The `xn--` prefix was chosen **randomly** when the standard finalized. See http://www.ccwhois.org/mailarchive/cctld-discuss/vol05/0096.html Of course you are free to artificially add any semantic on it you want, but this is certainly not how it was created. – Patrick Mevzek Jan 17 '20 at 14:35
  • 2
    Note: ccwhois.org is down for me, but perhaps another suitable citation might be https://web.archive.org/web/20180530223920/http://www.atm.tut.fi/list-archive/ietf-announce/msg13572.html – mwfearnley Jun 15 '22 at 09:05
31

This is Punycode which is used to Internationalize Domain Names in Applications.

From 1:

Punycode is intended for the encoding of labels in the Internationalized Domain Names in Applications (IDNA) framework, such that these domain names may be represented in the ASCII character set allowed in the Domain Name System of the Internet. The encoding syntax is defined in IETF document RFC 3492.

From 2:

Internationalizing Domain Names in Applications (IDNA) is a mechanism defined in 2003 for handling internationalized domain names containing non-ASCII characters. These names either are Latin letters with diacritics (ñ, é) or are written in languages or scripts which do not use the Latin alphabet: Arabic, Hangul, Hiragana and Kanji for instance. Although the Domain Name System supports non-ASCII characters, applications such as e-mail and web browsers restrict the characters which can be used as domain names for purposes such as a hostname.

Jonas Schäfer
  • 20,140
  • 5
  • 55
  • 69
9

The (simplified) semantic of 66b (i.e. the string after the last -) in your example is: "Move the cursor in diseolatinoamericano 4 chars to the right and insert a ñ". The one bigger code 76b (in little endian) means to move one more char and so:

$ idn2 -d xn--diseolatinoamericano-76b
diseoñlatinoamericano

. If you further increase the code you get:

-76b -> diseoñlatinoamericano
-86b -> diseolñatinoamericano
-96b -> diseolañtinoamericano
-b7b -> diseolatñinoamericano
-c7b -> diseolatiñnoamericano
-d7b -> diseolatinñoamericano
-e7b -> diseolatinoñamericano
-f7b -> diseolatinoañmericano
...
-m7b -> diseolatinoamericanño
-n7b -> diseolatinoamericanoñ

resulting in the position of the ñ moving further to the right.

After this increasing the code once more resets the insertion position to the start of the string and increases the codepoint of the character to insert by one. ñ has the codepoint 241, the next is ò and so:

-o7b -> òdiseolatinoamericano
-p7b -> dòiseolatinoamericano
...

The exact details (e.g. why -a6b had to be skipped above) can be found in rfc3492.

Uwe Kleine-König
  • 3,426
  • 1
  • 24
  • 20
  • Well you've piqued my curiosity, but without a very deep reading of the RFC I can't find out why `-a6b` is skipped. Any clues? Thanks. – mwfearnley Jun 15 '22 at 09:07
  • 1
    @mwfearnley The reason is in https://datatracker.ietf.org/doc/html/rfc3492#section-3.3, that is, `-a6b` encodes two numbers and so `xn--diseolatinoamericano-6ab` has two codepoints inserted. (check `idn2 -d xn--diseolatinoamericano-6ab | hexdump -C`) – Uwe Kleine-König Jun 17 '22 at 06:07
  • @mwfearnley Notice, that the letters precede the numbers in the encoding, so `a` is the smallest possible digit, which in this case falls under the first threshold, and thus would terminate the variable length number if present as the first digit. (@Uwe Kleine-König, please correct me if I'm wrong!) – Isti115 Feb 11 '23 at 20:38