3

Context: I am creating an app that stores its data in the location.hash. I want to encode as few characters as possible to maintain maximum legibility.

As explained in this answer, reserved characters are different for each segment of the URL. So what are the limitations for URL Fragment/location.hash specifically?

Related post: Unicode characters in URLs

Community
  • 1
  • 1
Thoran
  • 8,884
  • 7
  • 41
  • 50
  • See also: http://stackoverflow.com/questions/417142/what-is-the-maximum-length-of-a-url-in-different-browsers?rq=1 – Rowland Shaw Nov 16 '16 at 15:49
  • thanks for pointing that out. I am targeting modern browsers and it is not an issue. – Thoran Nov 16 '16 at 15:59
  • Does this answer your question? [URL fragment (#) allowed characters](https://stackoverflow.com/questions/26088849/url-fragment-allowed-characters) – Mingwei Samuel Aug 12 '21 at 19:15

1 Answers1

5

According to RFC 3986: Uniform Resource Identifier (URI):

fragment      = *( pchar / "/" / "?" )
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded   = "%" HEXDIG HEXDIG
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
              / "*" / "+" / "," / ";" / "="

Unpacking all that, and ignoring percent-encoding, I find the following set of characters:

abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-._~!$&'()*+,;=:@/?

Although the RFC does not mandate a particular encoding and deals in characters only (not bytes), according to Section 2.3 ALPHA means ASCII only, i.e. the 26 letters of the Latin alphabet. Any non-ASCII letters must therefore be percent-encoded.

Community
  • 1
  • 1
Thomas
  • 174,939
  • 50
  • 355
  • 478
  • Thanks @Thomas for your well-referenced answer. I see a lot of web apps using url hash for routing, and in some cases, I see local specific characters (e.g. Persian, Arabic) appearing on the address bar. Is it Chrome displaying them in a human-readable format? if not what is logic behind it? – Thoran Nov 16 '16 at 19:20
  • 1
    I have added my interpretation of the RFC into the post. Yet, using non-ASCII characters seems to work fine in Chrome. See also [this question](http://stackoverflow.com/questions/2742852/unicode-characters-in-urls). – Thomas Nov 17 '16 at 09:09