12

First things first:

I'm storing multiple domains to a database, after I've converted each and every domain name to it's IDNA version. What I need to know the maximum length such an IDNA-converted domain name can have so I can define the database field's max length.

Known fact:

Now, I know the maximum number of characters in a domain name (including any subdomains) is 255 characters.

Where I lost it:

That's easy at first glance, but... does this mean regular ascii characters of international characters (think UTF-8 encoding)?

To give you an example: The domain "müller.de" has 9 characters when I ignore that "ü" is an international character that needs more bytes to be represented. The IDNA version of "müller.de" is "xn--mller-kva.de", which has 16 characters. This shows there's definitely a difference in maximum length depending on "if" it is IDNA converted or not.

Depending on what kind of characters they mean, the 255-character maximum could be the international character version, the IDNA converted version or even both.

And that's where I lost it a bit... especially, since I have to take into account that not all domains will be sane and stuff like "öüßüöäéèê.example.äöüßüöäéèê-äöüßüöäéèê.test.äöüßüöäéèê.com" and even worse is to be expected.

So, "guessing" and "hoping for the best" is not an option. I need to know for sure...

The question is:

Based on the known fact that the maximum number of characters in a domain name (including any subdomains) is 255 characters... what is the maximum length of an IDNA converted domain name?

Or did they mean the IDNA converted version (punycode) is also restricted to 255 characters (which would mean that domains with international/unicode characters would actually have shorter limits in their unicode representation, because their IDNA converted version would have to respect the 255 char limit)?

cmbuckley
  • 40,217
  • 9
  • 77
  • 91

3 Answers3

10

OK, I think I found out myself and this snippet I found (by searching the internet) helped:

There were essentially two different options open for introducing internationalized domain names (IDN). The first was to make adjustments to the domain name system (DNS) which would allow unicode characters to be used directly. It was felt that this was too drastic a measure, and hence the second option was chosen. This involved compiling an algorithm to specify how a unicode string should be converted into a permitted ASCII domain name. This ACE string (ACE stands for ASCII Compatible Encoding) is then entered into the DNS. The introduction of IDN means that, for the very first time, the entry in the DNS is no longer identical with the domain name.

Source

The answer is that the length to respect is the 255 character limit as DNS expects it.

My suspicion was correct. The domain name and the entry in the DNS are two different things with IDN. It's the maximum length of the DNS entry that counts.

The domain name "müller.de" has 9 characters, but the corresponding ACE (ASCII Compatible Encoding) string "xn--mller-kva.de", however, has 16 characters.

It's the ACE string that is used by DNS and it's the ACE string that falls under the 255 character limit. This means that the maximum limit of it's unicode (domain) version is defined by the number of unicode characters used and if - after IDNA conversion - the string still fits within the 255 character limit.

Geez, the specs sure could've been be a bit clearer on things like this. Especially as international domain names have been around since somewhere near March 1st, 2004. But I found the answer, and that's what counts.

Perhaps this can help someone who's having the same question.

The simple answer related to my database field length is 255 CHARs.

The fact that I store the domain names in their IDNA converted (punycode/ACE string) version only confirms this maximum character limit.

Community
  • 1
  • 1
  • Mere seconds apart... I think you win though :-) Excellent question, and glad we came to the same conclusion! – cmbuckley Jan 03 '12 at 20:59
  • 2
    Bah, who's counting the seconds? I'll simply accept your answer instead of my own. It's the least I can do to give you something back for your effort. Guess I'm in a social mood today... :) –  Jan 03 '12 at 21:07
  • Thanks for the question and answer. Any chance you could include a pointer/link to the source of the snippet or any references (RFC, IETF, etc. docs)? Thanks! – JJC Mar 24 '12 at 14:42
  • 2
    @JJC added the source article for the quote, which references the relevant RFCs. – cmbuckley Jun 11 '12 at 22:54
  • 1
    @cbuckley Good to have a wingman when I'm late due to "payed work". Makes stackoverflow so much more worth when you can see the positive results of working in a community like this. *upvoted* ;) –  Jul 31 '12 at 12:20
  • 1
    DNS name limit is actually *253* characters, not 255 - http://stackoverflow.com/a/28918017/18829 – Alex Dupuy Mar 07 '15 at 18:03
  • 1
    It may also be worth noting that each Subdomain is limited to 63 characters, which would also be evaluated 'After' The Punycode conversion... https://en.wikipedia.org/wiki/Subdomain – B Hart May 02 '16 at 09:41
  • A database column for every single field is generally unnecessary. You can dump fields you won't generally search/index into a MEDIUMTEXT column named 'info', which contains serialized data (e.g. JSON). Then you don't have to worry about how much data might be stored (unless the column exceeds 16MB) AND there are major additional benefits (e.g. adding fields to and removing fields from the application is a 5 minute task that doesn't require a schema change, which dramatically reduces the number of times the app will break when making changes to something close to zero). – CubicleSoft May 22 '21 at 13:15
8

My understanding is that the 255-character limit is to be considered after the IDNA conversion.

This is because DNS records have this character limit, and in general DNS records can only contain letters, digits and hyphens (from Wikipedia). The DNS server therefore uses the Punycode version of the IDN for its record, not the Unicode version.

cmbuckley
  • 40,217
  • 9
  • 77
  • 91
  • I came to the same conclusion after checking some registrar websites and their info about "Domain Names with Accents and Umlauts". Yes, it's 255 after conversion. But thanks for your reply. It means that I'm not the only one coming to the same conclusion, and that's a relaxing thing to know! ;) –  Jan 03 '12 at 21:02
-2

RFC3492 says this about one of the features os IDNA encoding:

Efficient encoding: The ratio of basic string length to extended string length is small. This is important in the context of domain names because RFC1034 restricts the length of a domain label to 63 characters.

That is it. 63 characters is a maximum length for any domain name regadless of wether it is in IDNA or in ASCII.

Alexander Artemenko
  • 21,378
  • 8
  • 39
  • 36