12

What characters are allowed in an URL query string?

Do query strings have to follow a particular format?

Aran Mulholland
  • 23,555
  • 29
  • 141
  • 228
  • 1
    Anything other than those needs to be escaped are allowed in an URL, see what needs to be escaped in this question: http://stackoverflow.com/questions/2322764/what-characters-must-be-escaped-in-an-http-query-string –  Nov 14 '12 at 05:33

3 Answers3

13

Per https://www.rfc-editor.org/rfc/rfc3986

In section 2.2 Reserved Characters, the following characters are listed:

reserved = gen-delims / sub-delims

gen-delims = “:” / “/” / “?” / “#” / “[” / “]” / “@”

sub-delims = “!” / “$” / “&” / “’” / “(” / “)” / “*” / “+” / “,” / “;” / “=”

The spec then says:

If data for a URI component would conflict with a reserved character’s purpose as a delimiter, then the conflicting data must be percent-encoded before the URI is formed.

Next, in section 2.3 Unreserved Characters, the following are listed:

unreserved = ALPHA / DIGIT / “-” / “.” / “_” / “~”

Community
  • 1
  • 1
Steven
  • 1,231
  • 1
  • 12
  • 11
  • 3
    [RFC 3986 - **Section 3.4**](http://tools.ietf.org/html/rfc3986#page-23) specifically describes the query string and notably includes the sub-delims and a handful of others. In summary: `A`-`Z`, `a`-`z`, `0`-`9`, `-`, `.`, `_`, `~`, `!`, `$`, `&`, `'`, `(`, `)`, `*`, `+`, `,`, `;`, `=`, `:`, `@`, `/`, `?` – MrWhite Mar 23 '15 at 21:01
  • @MrWhite It's been a while since your comment, but what does your summary mean in plain english? Do these characters need to be encoded or not? I've looked at [section 3.4](https://www.rfc-editor.org/rfc/rfc3986#section-3.4) but didn't see a list. – Abhijit Sarkar Feb 14 '22 at 20:56
12

Wikipedia has your answer: http://en.wikipedia.org/wiki/Query_string

"URL Encoding: Some characters cannot be part of a URL (for example, the space) and some other characters have a special meaning in a URL: for example, the character # can be used to further specify a subsection (or fragment) of a document; the character = is used to separate a name from a value. A query string may need to be converted to satisfy these constraints. This can be done using a schema known as URL encoding.

In particular, encoding the query string uses the following rules:

  • Letters (A-Z and a-z), numbers (0-9) and the characters '.','-','~' and '_' are left as-is
  • SPACE is encoded as '+' or %20[citation needed]
  • All other characters are encoded as %FF hex representation with any non-ASCII characters first encoded as UTF-8 (or other specified encoding)

The octet corresponding to the tilde ("~") character is often encoded as "%7E" by older URI processing implementations; the "%7E" can be replaced by"~" without changing its interpretation. The encoding of SPACE as '+' and the selection of "as-is" characters distinguishes this encoding from RFC 1738."

Regarding the format, query strings are name value pairs. The ? separates the query string from the URL. Each name value pair is separated by an ampersand (&) while the name (key) and value is separated by an equals sign (=). eg. http://domain.com?key=value&secondkey=secondvalue

Under Structure in the Wikipedia reference I provided:

  • The question mark is used as a separator and is not part of the query string.
  • The query string is composed of a series of field-value pairs
  • Within each pair, the field name and value are separated by an equals sign, '='.
  • The series of pairs is separated by the ampersand, '&' (or semicolon, ';' for URLs embedded in HTML and not generated by a ...; see below).
  • W3C recommends that all web servers support semicolon separators in addition to ampersand separators[6] to allow application/x-www-form-urlencoded query strings in URLs within HTML documents without having to entity escape ampersands.
Clarice Bouwer
  • 3,631
  • 3
  • 32
  • 55
  • Can you provide a citation for the final paragraph? – Lightness Races in Orbit Jan 27 '14 at 16:14
  • I added that paragraph based on personal experience but I've updated and added more information that I could find to back it up. In doing so, I noticed that key-values are not only separated by an ampersand but can be by a semi-colon although I've never come across it before. Also, the question mark is not part of the QS but is rather a separator. – Clarice Bouwer Jan 31 '14 at 11:21
  • 1
    In the text of the answer: "each name value pair is prefixed with an ampersand" the wording ("prefixed") is misleading. Farther down, there is the correct "...pairs is separated...". – laune Jul 11 '14 at 11:15
1

This link has the answer and formatted values you all need.

https://perishablepress.com/url-character-codes/

For your convenience, this is the list:

<     %3C
>     %3E
#     %23
%     %25
{     %7B
}     %7D
|     %7C
\     %5C
^     %5E
~     %7E
[     %5B
]     %5D
`     %60
;     %3B
/     %2F
?     %3F
:     %3A
@     %40
=     %3D
&     %26
$     %24
+     %2B
"     %22
space     %20
pid
  • 11,472
  • 6
  • 34
  • 63
  • 1
    Note that [link-only answers](http://meta.stackoverflow.com/tags/link-only-answers/info) are discouraged, SO answers should be the end-point of a search for a solution (vs. yet another stopover of references, which tend to get stale over time). Please consider adding a stand-alone synopsis here, keeping the link as a reference. – kleopatra Jul 21 '15 at 10:42