4

I have some IDNA encoded strings that I cannot decode. In Python, I try u"xn--grohandel-shop-2fb".decode("idna") and get the error "IDNA does not round-trip". The same for "xn--sottmqqo5-lgbe9b7no0hmz9u".

I'm stumped, and Googling the error doesn't help at all.

Steve
  • 4,033
  • 5
  • 32
  • 29

1 Answers1

5

The error "IDNA does not round-trip" means that the module gets a different result when decoding and encoding the string.

By looking at the source code for Python's IDNA module, the error "IDNA does not round-trip" is raised on line 139 if the module can't recreate the input. In the decode function the input is split by dots and every part is converted in toUnicode. There the text is decoded, but before the result is returned it encodes the result and compares it with the input and raises the error if it isn't the same: "it doesn't round-trip" or encode(decode(text)) != text.

In the error message you also get the two strings that it tried to compare, in the first example you get:

UnicodeError: ('IDNA does not round-trip', 'xn--grohandel-shop-2fb', 'grosshandel-shop')

You get the error because it has converted ß in "großhandel-shop" to ss in "grosshandel-shop". The ß character was added to the .de-tld late 2010, so this is a bug. Before the change ß was supposed to be changed to ss.

Your second example is probably corrupt, because it converts to: "đsottĤmqĐqǗoĔ⢠5"

tripleee
  • 175,061
  • 34
  • 275
  • 318
some
  • 48,070
  • 14
  • 77
  • 93
  • 1
    xn--grohandel-shop-2fb has been correctly encoded by IDNA 2008 (which is correct in Germany/DENIC since a while). Your Python very likely tries to decode it using the old IDNA 2003, which doesn't know 'ß'. see https://www.denic.de/en/know-how/idn-domains/ – rockdaboot Jan 23 '17 at 14:56