3

I was reading about IDN homograph atack and didn't find exactly stated does browsers encode in punycode only domain or rest of the URL is included (path and query). So my question is does one of popular browsers (FF, IE, Chrome, Safari, Opera) encode rest of the URL (IRI to be exact) with punycode ?

Antonio Bakula
  • 20,445
  • 6
  • 75
  • 102

1 Answers1

5

Only the domain name part is encoded with punycode. This is due to the restrictions imposed on the allowable characters in a (traditional) domain name. The path part of the URL has no such restrictions, so UTF-8 is often used.

Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
  • I know (or at least presume) that you don't have the magic crystal ball :-), and I don't expect definitive answer (or any) but if you please can share your opinion, what are the chances that some browser starts to use punycode on path part ? – Antonio Bakula Apr 03 '12 at 00:02
  • 1
    I think the chance of that is very low, if not zero. IDNA is just for domain names. Everything I know about it can be found at [Internationalized domain name](https://en.wikipedia.org/wiki/Internationalized_domain_name). – Greg Hewgill Apr 03 '12 at 00:08
  • 3
    @Antonio The path part of the URL doesn't just use UTF-8, it often uses a percent-encoded version of the bytes of the UTF-8 encoded version (because at the point where the path is passed, the server is supposed to assume ISO 8859-1 for the bytes actually presented). The punycoding is only used for the domain name (and in fact, it is only used for individual pieces) because that's processed at a totally different (and much earlier) stage of URL retrieval. – Donal Fellows Sep 09 '12 at 10:45
  • @DonalFellows is correct in that URLs do not allow UTF-8. However the RFC (https://tools.ietf.org/html/rfc3986) for URLs defines the characters to be in ASCII, not 8859-1 (Latin1). Either way, all unicode data in the path should come through as percent-encoded values. –  Dec 21 '15 at 17:40