98

Which characters are allowed in GET parameters without encoding or escaping them? I mean something like this:

http://www.example.org/page.php?name=XYZ

What can you have there instead of XYZ? I think only the following characters:

  • a-z (A-Z)
  • 0-9
  • _

Is this the full list or are there additional characters allowed?

starball
  • 20,030
  • 7
  • 43
  • 238
caw
  • 30,999
  • 61
  • 181
  • 291
  • possible duplicate of [HTTP URL - allowed characters in parameter names](http://stackoverflow.com/questions/814700/http-url-allowed-characters-in-parameter-names) – j0k Aug 09 '12 at 10:47
  • 2
    @j0k: No real dupe, as in the other question escaping is required, as opposed to here, where is liked to be avoided. – Marcel Sep 13 '12 at 06:57

7 Answers7

133

There are reserved characters, that have a reserved meanings, those are delimiters — :/?#[]@ — and subdelimiters — !$&'()*+,;=

There is also a set of characters called unreserved characters — alphanumerics and -._~ — which are not to be encoded.

That means, that anything that doesn't belong to unreserved characters set is supposed to be %-encoded, when they do not have special meaning (e.g. when passed as a part of GET parameter).

See also RFC3986: Uniform Resource Identifier (URI): Generic Syntax

wearego
  • 178
  • 10
Michael Krelin - hacker
  • 138,757
  • 24
  • 193
  • 173
  • 2
    Thank you very much! So I have to add . and ~ to my list? Can I write index.php?page=start_en-new~. without escaping it? – caw Sep 21 '09 at 18:32
  • 3
    It would be somewhat too bold a statement to say you can't, but you shouldn't. If you were to normalize URI you'd *have* to escape unreserved characters (and only unreserved), but it is very likely that it will actually *work* unescaped. – Michael Krelin - hacker Sep 21 '09 at 18:38
  • Generally, you have the escape function that escapes everything that needs to be escaped. And you normally use this function to escape *all* parameters you pass. – Michael Krelin - hacker Sep 21 '09 at 18:39
  • So I shouldn't use ~ and . unescaped, either? So only alphanumeric? Is urlencode() in PHP the function you mean? I could pass all characters to urlencode() and see what goes out unescaped!? – caw Sep 21 '09 at 19:19
  • 3
    OMG, I haven't looked carefully at your example. I thought that was just a generic bunch of special characters ;-) No, you don't have to escape those, of course, as they are unreserved. Sorry for confusion. As for `urlencode()` I have no idea if it works correctly - it's not always the case with PHP functions - but if it does then yes, you can test with it ;-) Like I said - escape everything but unreserved. – Michael Krelin - hacker Sep 21 '09 at 20:02
  • :) Thanks. So I create a page with the name "~my_start-page.en" and pass the name via GET without any problems, correct? page.php?name=~my_start-page.en – caw Sep 21 '09 at 21:36
  • Yes, that should be it. Those characters are safe as a query parameters with no escaping, so whether you will have problems processing that name later I don't know, but you can pass it with no problems ;-) – Michael Krelin - hacker Sep 21 '09 at 22:30
  • You're right, ~ and . seem to work fine. But what about the other answers here? They mention other characters which can be used unencoded as well. Why didn't you mention them? Are the other answers wrong? – caw Sep 22 '09 at 16:05
  • I did mention RFC on URI syntax, didn't I? And the newest of all RFCs mentioned too! ;-) Actually, like I said, some other approaches to escaping may go unpunished, but still non standard-conformant. As long as URIs are to be normalized and compared for equality in normalized form the punishment will follow the crime ;-) – Michael Krelin - hacker Sep 22 '09 at 18:03
  • So the RFCs mentioned in the other questions are about 8 years older and contain special chars which aren't allowed unencoded anymore? – caw Sep 22 '09 at 18:11
  • I haven't really read those RFCs so I don't know what they deal with. But what those who reference them say is that it deals with *characters allowed in the URL*. Obviously, `&` is also allowed, but it has special meaning, so I suspect they answered different question. – Michael Krelin - hacker Sep 22 '09 at 18:13
  • 1
    The RFC says that actually it is allowed to not escape the characters ```/```and ```?```. I was looking this up because Swift does not escape these in their ```stringByAddingPercentEncodingForURLQueryParameter``` method! (Correctly, apparently) – Stijn Aug 22 '16 at 16:18
  • I believe this answer is incorrect. The RFC does not say that all unreserved characters need to be percent-encoded. It actually says, "If data for a URI component would conflict with a reserved character's purpose as a delimiter, then the conflicting data must be percent-encoded before the URI is formed." The answer by @dmitriy explains it well. – paulkernfeld May 18 '22 at 19:19
  • @paulkernfeld, isn't that exactly what the answer says? – Michael Krelin - hacker May 19 '22 at 08:42
  • I'm not sure! I'm having trouble parsing the final sentence of the answer. It might help to concisely and explicitly state exactly which characters must be percent-encoded at the beginning of the answer. – paulkernfeld May 20 '22 at 16:59
  • @paulkernfeld, I believe it is. As for putting it into beginning of the answer, I don't feel comfortable referring to the entities that are not yet defined :) Also, I do not see how the praised dmitri's answer deals with it any better. It's a fine answer, just more verbose, with more RFC content copied and 9 years late :) – Michael Krelin - hacker May 20 '22 at 19:37
25

The question asks which characters are allowed in GET parameters without encoding or escaping them.

According to RFC3986 (general URL syntax) and RFC7230, section 2.7.1 (HTTP/S URL syntax) the only characters you need to percent-encode are those outside of the query set, see the definition below.

However, there are additional specifications like HTML5, Web forms, and the obsolete Indexed search, W3C recommendation. Those documents add a special meaning to some characters notably, to symbols like = & + ;.

Other answers here suggest that most of the reserved characters should be encoded, including "/" "?". That's not correct. In fact, RFC3986, section 3.4 advises against percent-encoding "/" "?" characters.

it is sometimes better for usability to avoid percent- encoding those characters.

RFC3986 defines query component as:

query       = *( pchar / "/" / "?" )
pchar       = unreserved / pct-encoded / sub-delims / ":" / "@"
pct-encoded = "%" HEXDIG HEXDIG
sub-delims  = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~" 

A percent-encoding mechanism is used to represent a data octet in a component when that octet's corresponding character is outside the allowed set or is being used as a delimiter of, or within, the component.

The conclusion is that XYZ part should encode:

special: # % = & ;
Space
sub-delims
out of query set: [ ]
non ASCII encodable characters

Unless special symbols = & ; are key=value separators.

Encoding other characters is allowed but not necessary.

Community
  • 1
  • 1
dmitri
  • 3,183
  • 23
  • 28
  • Doesn't presence in the "sub-delims" set mean that `"!" / "$" / "&" ...` are "being used as a delimiter of, or within, the component." and therefor should be percent-encoded? – lmsurprenant Sep 06 '19 at 12:48
  • Sub-delimiters are not delimiters in query and therefore should not be escaped. – Basilevs Jul 01 '21 at 13:38
8

I did a test using the Chrome address bar and a $QUERY_STRING in bash, and observed the following:

~!@$%^&*()-_=+[{]}\|;:',./? and grave (backtick) are passed through as plaintext.

, ", < and > are converted to %20, %22, %3C and %3E respectively.

# is ignored, since it is used by ye olde anchor.

Personally, I'd say bite the bullet and encode with base64 :)

jimmetry
  • 2,141
  • 2
  • 16
  • 12
  • These characters you mention are probably the ones that will be escaped in HTML, not the query string. I don't believe =, ? and & can be passed in plain text. – Luc Bloom Feb 24 '18 at 11:21
  • Appreciate your effort but it really doesnt mean a lot to us as a reserved character could be accepted by Chrome today but not tomorrow, or other clients could reject them - much safer to go with the official definition which is: `ALPHA / DIGIT / “-” / “.” / “_” / “~”` – Muleskinner Jan 10 '20 at 12:57
8

All of the rules concerning the encoding of URIs (which contains URNs and URLs) are specified in the RFC1738 and the RFC3986, here's a TL;DR of these long and boring documents:

Percent-encoding, also known as URL encoding, is a mechanism for encoding information in a URI under certain circumstances. The characters allowed in a URI are either reserved or unreserved. Reserved characters are those characters that sometimes have special meaning, but they are not the only characters that needs encoding.

There are 66 unreserved characters that doesn't need any encoding: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.~

There are 18 reserved characters which needs to be encoded: !*'();:@&=+$,/?#[], and all the other characters must be encoded.

To percent-encode a character, simply concatenate "%" and its ASCII value in hexadecimal. The php functions urlencode and rawurlencode do this job for you, as well as the js functions encodeURIComponent and encodeURI.

Nino Filiu
  • 16,660
  • 11
  • 54
  • 84
5

From RFC 1738 on which characters are allowed in URLs:

Only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL.

The reserved characters are ";", "/", "?", ":", "@", "=" and "&", which means you would need to URL encode them if you wish to use them.

ctford
  • 7,189
  • 4
  • 34
  • 51
4

Alphanumeric characters and all of

~ - _ . ! * ' ( ) ,

are valid within an URL.

All other characters must be encoded.

user66001
  • 774
  • 1
  • 13
  • 36
womp
  • 115,835
  • 26
  • 236
  • 269
  • Thanks, you've understood everything correctly. I want to know which characters I can use without encoding them. Are you sure that !*'(), are such characters? – caw Sep 21 '09 at 18:35
  • by ctford's answer referring the RFC-1738, the dollar sign is also a special character who does not need encoding. – рüффп Jul 21 '16 at 10:57
0

"." | "!" | "~" | "*" | "'" | "(" | ")" are also acceptable [RFC2396]. Really, anything can be in a GET parameter if it is properly encoded.

geowa4
  • 40,390
  • 17
  • 88
  • 107
  • but those have special meaning, so if you whant to *send* % or + you **have** to encode them. – Esteban Küber Sep 21 '09 at 17:02
  • yeah i don't know why i wrote % – geowa4 Sep 21 '09 at 17:05
  • Thank you! I only want to know which characters can be used WITHOUT encoding or escaping them. I should have pointed out this better. So can I really use *!'()| without encoding them? – caw Sep 21 '09 at 18:36