93

I'm using the fragment identifier to create a permalink for AJAX events in my web app similar to this guy. Something like:

http://www.myapp.com/calendar#filter:year/2010/month/5

I've done quite a bit of searching but can't find a list of valid characters for the fragment idenitifer. The W3C spec doesn't offer anything.

Do I need to encode the characters the same as the URL in has in general?

There doesn't seem to be any good information on this anywhere.

Community
  • 1
  • 1
sohtimsso1970
  • 3,216
  • 4
  • 28
  • 38

3 Answers3

110

See the RFC 3986.

fragment    = *( pchar / "/" / "?" )
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"    
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded   = "%" HEXDIG HEXDIG
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="

So you can use !, $, &, ', (, ), *, +, ,, ;, =, something matching %[0-9a-fA-F]{2}, something matching [a-zA-Z0-9], -, ., _, ~, :, @, /, and ?

Community
  • 1
  • 1
Artefacto
  • 96,375
  • 17
  • 202
  • 225
  • Perfect, I was looking for that in the RFC but couldn't seem to find the right clause. Thanks. – sohtimsso1970 May 17 '10 at 14:30
  • 1
    @Artefacto, So does it mean that a "%" is **not** allowed everywhere, but **only** allowed when two valid characters follow it? – Pacerier Oct 11 '14 at 17:37
  • 1
    @Pacerier yes, `%` is only allowed as an escape character. Use `%25` to encode a single `%`. – gioele Feb 01 '16 at 09:33
  • 1
    The back / forward button doesn't work with fragment identifiers that have a colon in spite of the RFC stating that its a valid character. – Vince Mar 01 '16 at 23:55
  • 1
    Wow! Would be probably easier to tell what ascii characters *cannot* be used! – Déjà vu Jun 14 '16 at 07:57
  • In case anyone wants a quick and dirty sanitizer like I did: `myFragment.replace(/(?=((?:[\!\$&'\(\)\*\+,;=a-zA-Z0-9\-._~:@\/?]|%[0-9a-fA-F]{2})*))\1./g, "$1-");` Replace the - in the "$1-" with the desired placeholder char – wils Jun 02 '18 at 18:36
  • So... basically base 81. Not a clean way to use that... – William Entriken Apr 07 '21 at 01:47
32

https://www.rfc-editor.org/rfc/rfc3986#section-3.5:

fragment    = *( pchar / "/" / "?" )

and

pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
              / "*" / "+" / "," / ";" / "="
pct-encoded   = "%" HEXDIG HEXDIG

So, combined, the fragment cannot contain #, a raw %, ^, [, ], {, }, \, ", < and > according to the RFC.

Community
  • 1
  • 1
kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005
  • Thanks. Gave the answer to Artefacto since he was a hair faster but gave you +1 for the response. – sohtimsso1970 May 17 '10 at 14:30
  • 2
    I suppose you're missing non-printable ASCII characters and non-ascii characters. – Artefacto May 17 '10 at 14:30
  • 4
    Seems that you forgot ``VERTICAL BAR (|)`` and ``GRAVE ACCENT (`)`` and ``SPACE ( )`` in the not-list. So the full list of printable (7-bit) US-ASCII characters in the not-list is: ``"#%< >[\]^`{|}`` – GitaarLAB Nov 16 '17 at 16:02
2

One other RFC speak of that: RFC-1738

URL schemeparts for ip based protocols:
HTTP

httpurl        = "http://" hostport [ "/" hpath [ "?" search ]]
hpath          = hsegment *[ "/" hsegment ]
hsegment       = *[ uchar | ";" | ":" | "@" | "&" | "=" ]
search         = *[ uchar | ";" | ":" | "@" | "&" | "=" ]
Community
  • 1
  • 1
sirkazey
  • 21
  • 2