1

I have some working url with unicode and trying to apply IDNA encoding

test = ur"http://example.com/%D0%94%D0%B8%D1%81%D0%BA%D0%BE%D0%BD%D1%82-%D1%82%D0%B0%D0%BA%D1%81%D0%B8.22219/"
url_unq = urllib.unquote(test)
print url_unq
print url_unq.encode("idna")

The code above fails with:

File "C:\Python25\lib\encodings\idna.py", line 38, in nameprep raise UnicodeError("Invalid character %r" % c) UnicodeError: Invalid character u'\x94'

What's wrong with my encodings?

Evgeniy
  • 193
  • 3
  • 19

1 Answers1

2

This is because \x94 can not be encoded in IDNA - see RFC3454:

0080-009F; [CONTROL CHARACTERS]

Community
  • 1
  • 1
Kimvais
  • 38,306
  • 16
  • 108
  • 142
  • @Eugene - no? Your second character after `/` is `%94` which is same as `\x94` – Kimvais Feb 20 '12 at 12:34
  • Okay, thanks to http://stackoverflow.com/questions/300445/how-to-unquote-a-urlencoded-unicode-string-in-python it's converted now, but now I see that i'm wrong: domain name ASCII encoding but path doesn't - IDNA will not help – Evgeniy Feb 20 '12 at 12:39