5

Is it possible to use UTF-8 in a subdomain? If so, which characters are allowed and how does the can't-mix-encodings thing work?

I've tried to RTFM, but Google wan't of much help

krtek
  • 26,334
  • 5
  • 56
  • 84
Fluffy
  • 27,504
  • 41
  • 151
  • 234

1 Answers1

4

There aren't many things special about subdomains. A given domain name foo.example.com is an ordered list of labels (foo, example, com). So you might want to know if you can use UTF-8 in a given label.

The low level answer is that a label is defined as:

<label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]
<let-dig> ::= <letter> | <digit>
<letter> ::= any one of the 52 alphabetic characters A through Z in upper case and a through z in lower case
<digit> ::= any one of the ten digits 0 through 9
<ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>
<let-dig-hyp> ::= <let-dig> | "-"

which means that you can only find [-a-zA-Z0-9] in a label.

However, IDNA can be used to encode Unicode characters. In short, a label containing other characters is encoded with: "xn--" + punycode(nameprep(label)).

As for limitations at least:

  • for characters can't be in a IDN label (U+002E, U+3002, U+FF0E, U+FF61).
Community
  • 1
  • 1
ysdx
  • 8,889
  • 1
  • 38
  • 51
  • 1
    Link-only answers are bad in general (and I also happen not to be able to find much relating to the actual question of *subdomains* there) – Jasper Apr 01 '15 at 14:57