10

Are chinese characters allowed to be entered in URLs?

As tested, chinese characters are able to be entered in URLs, and it will convert to punycode as well and send out the request as well too, and reach to the related page.

But for currently, is there anybody else will do validation for website URLs to be allowed chinese character as well?

Makoto
  • 104,088
  • 27
  • 192
  • 230
deepWebMie
  • 1,241
  • 2
  • 10
  • 13

2 Answers2

11

Punycode exists to be able to use non-Latin scripts in non-supported software. So whilst I like my site http://見.香港/ I can enter http://xn--nw2a.xn--j6w193g/ if I cannot enter the Unicode original form.

Some website developers program overly defensively, for example with Google Apps you cannot use punycode domains at all due to aggressive white listing that has not updated with ICANN standards.

UPDATE: Stackoverflow now supports Unicode domain names and thus comments below are outdated. The unusual domain name is the punycode, i.e. encoded, version of Unicode for systems that do not directly support Unicode.

xn--nw2a = 見
xn--j6w193g = 香港

As of 2022/1/1, Stackoverflow has a feature that interprets punycode domains as their Unicode form in preview, but not when saved. This is not really appropriate for a code platform which may be discussing punycode, but would be fine for other sites in the exchange.

Screenshot of preview function in stackoverflow:

Screenshot of stackoverflow edit preview with punycode domain

Steve-o
  • 12,678
  • 2
  • 41
  • 60
  • 1
    As a perfect example see Stackoverflow itself does not parse Unicode domain names. – Steve-o Aug 25 '11 at 03:51
  • Do not parse unicode domain name ? So is it mean that it is not necessary to validate chinese input for URLs? Just validate normal way as allow alphanumeric, hyphen, underscore and dot only as well ? – deepWebMie Aug 25 '11 at 06:39
  • @deepWebMie you cannot click my Unicode link above. Ideally you should support Unicode URLs. There is no rulebook saying you MUST, but you must consider these are new features and will take time to be commonly handled correctly. – Steve-o Aug 25 '11 at 06:43
  • yes I cannot click on your Unicode link above, but can click on your punycode link above as well as it direct me to http://見.香港/ then. – deepWebMie Aug 25 '11 at 06:53
  • User require to enter chinese url. – deepWebMie Aug 26 '11 at 08:56
  • @Steve-o, Why would you choose to use such a weird URL like http://xn--nw2a.xn--j6w193g/ when with the same cost you can get a "proper" domain name? – Pacerier Jul 17 '15 at 11:53
1

All non-ascii characters that presents in domain name will (should) be converted to puny-code. It is browser's business to display it as a hieroglyphs

zerkms
  • 249,484
  • 69
  • 436
  • 539
  • 1
    The important historical note is that many browsers disable automatic rendition of Unicode due to the [security implications](http://unicode.org/reports/tr36/tr36-8.html) of similar looking Unicode entities. – Steve-o Aug 25 '11 at 03:53