60

Are square brackets in URLs allowed?

I noticed that Apache commons HttpClient (3.0.1) throws an IOException, wget and Firefox however accept square brackets.

URL example:

http://example.com/path/to/file[3].html

My HTTP client encounters such URLs but I'm not sure whether to patch the code or to throw an exception (as it actually should be).

oHo
  • 51,447
  • 27
  • 165
  • 200
Benedikt Waldvogel
  • 12,406
  • 8
  • 49
  • 61
  • Firefox shows you a user friendly URL in the address bar, but the URL it actually sends has the special characters encoded. – DJDaveMark Jan 16 '20 at 08:38
  • Many versions of Wordpress and Magento use unencoded square brackets, so if you are making a client I would suggest only emitting a warning or message level issue. Ultimately you should assume application developers will not provide you with pristine input and you dont want to rely on behavior that is currently only dependent on the app's gateway of choice – That Realty Programmer Guy Sep 23 '21 at 23:44

10 Answers10

62

RFC 3986 states

A host identified by an Internet Protocol literal address, version 6 [RFC3513] or later, is distinguished by enclosing the IP literal within square brackets ("[" and "]"). This is the only place where square bracket characters are allowed in the URI syntax.

So you should not be seeing such URI's in the wild in theory, as they should arrive encoded.

Mark Elliot
  • 75,278
  • 22
  • 140
  • 160
Justin Cormack
  • 1,226
  • 12
  • 8
28

Square brackets [ and ] in URLs are not often supported.

Replace them by %5B and %5D:

  • Using a command line, the following example is based on bash and sed:

    url='http://example.com?day=[0-3][0-9]'
    encoded_url="$( sed 's/\[/%5B/g;s/]/%5D/g' <<< "$url")"
    
  • Using Java URLEncoder.encode(String s, String enc)

  • Using PHP rawurlencode() or urlencode()

    <?php
    echo '<a href="http://example.com/day/',
        rawurlencode('[0-3][0-9]'), '">';
    ?>
    

    output:

    <a href="http://example.com/day/%5B0-3%5D%5B0-9%5D">
    

    or:

    <?php
    $query_string = 'day=' . urlencode('[0-3][0-9]') .
                    '&month=' . urlencode('[0-1][0-9]');
    echo '<a href="http://example.com?',
          htmlentities($query_string), '">';
    ?>
    
  • Using your favorite programming language... Please extend this answer by posting a comment or editing directly this answer to add the function you use from your programming language ;-)

For more details, see the RFC 3986 specifying the URL syntax. The Appendix A is about %-encoding in the query string (brackets as belonging to “gen-delims” to be %-encoded).

oHo
  • 51,447
  • 27
  • 165
  • 200
15

I know this question is a bit old, but I just wanted to note that PHP uses brackets to pass arrays in a URL.

http://www.example.com/foo.php?bar[]=1&bar[]=2&bar[]=3

In this case $_GET['bar'] will contain array(1, 2, 3).

MM.
  • 2,653
  • 4
  • 26
  • 36
  • 18
    Correct, but they should still be encoded when the browser isn't doing so automatically. PHP will still correctly interpret the brackets, and its own http_build_query() function encodes them as well. – Wilco Jul 28 '12 at 00:03
  • @Wilco opinions aside, they are not encoded by many PHP frameworks – That Realty Programmer Guy Sep 23 '21 at 23:52
5

Pretty much the only characters not allowed in pathnames are # and ? as they signify the end of the path.

The uri rfc will have the definative answer:

http://www.ietf.org/rfc/rfc1738.txt

Unsafe:

Characters can be unsafe for a number of reasons. The space character is unsafe because significant spaces may disappear and insignificant spaces may be introduced when URLs are transcribed or typeset or subjected to the treatment of word-processing programs. The characters "<" and ">" are unsafe because they are used as the delimiters around URLs in free text; the quote mark (""") is used to delimit URLs in some systems. The character "#" is unsafe and should always be encoded because it is used in World Wide Web and in other systems to delimit a URL from a fragment/anchor identifier that might follow it. The character "%" is unsafe because it is used for encodings of other characters. Other characters are unsafe because gateways and other transport agents are known to sometimes modify such characters. These characters are "{", "}", "|", "\", "^", "~", "[", "]", and "`".

All unsafe characters must always be encoded within a URL. For example, the character "#" must be encoded within URLs even in systems that do not normally deal with fragment or anchor identifiers, so that if the URL is copied into another system that does use them, it will not be necessary to change the URL encoding.

The answer is that they should be hex encoded, but knowing postel's law, most things will accept them verbatim.

1729
  • 4,961
  • 2
  • 28
  • 17
  • 7
    All unsafe characters MUST always be encoded within a URL. MUST, not SHOULD. – plaugg Jul 24 '12 at 08:06
  • @plaugg clients exist in the real world, as does the context of this answer's "should" -- whereas the spec exists in a formalized system of information. ie, to be formally compliant, yes, you MUST encode them. however in reality one can only venture an *opinion* that you *should* encode them as they will work just fine in the general case. In fact the spec should be altered to reflect usage – That Realty Programmer Guy Sep 23 '21 at 23:47
5

Any browser or web-enabled software that accepts URLs and is not throwing an exception when special characters are introduced is almost guaranteed to be encoding the special characters behind the scenes. Curly brackets, square brackets, spaces, etc all have special encoded ways of representing them so as not to produce conflicts. As per the previous answers, the safest way to deal with these is to URL-encode them before handing them off to something that will try to resolve the URL.

Lee
  • 151
  • 2
  • 10
  • This is true for space and other special characters, but not for square bracket. When I enter https://www.example.com/?a[]=1 in address bar, I saw the square bracket sent unescaped in HTTP. – Franklin Yu Apr 08 '19 at 14:33
  • 1
    Chrome (98) and Firefox (97) are not encoding [ and ] characters – Olivier Masseau Feb 21 '22 at 12:59
2

StackOverflow seems to not encode them:

https://stackoverflow.com/search?q=square+brackets+[url]

Community
  • 1
  • 1
Casebash
  • 114,675
  • 90
  • 247
  • 350
  • 7
    I believe what you're seeing is your browser accepting them as input. However, if you click one of the tabs on the Stackoverflow result page, it does encode the brackets ...search?tab=newest&q=square%20brackets%20%5burl%5d – Feckmore Oct 30 '14 at 15:53
  • 2
    I checked the request header, and the location text and they are not being encoded in Chrome. What kind of test would we have to do to see if the were "tolerated"? – QueueHammer Mar 25 '21 at 03:36
2

For using the HttpClient commons class, you want to look into the org.apache.commons.httpclient.util.URIUtil class, specifically the encode() method. Use it to URI-encode the URL before trying to fetch it.

rjray
  • 5,525
  • 4
  • 31
  • 37
1

Square brackets are considered unsafe, but majority of browsers will parse those correctly. Having said that it is better to replace square brackets with some other characters.

sixtytrees
  • 1,156
  • 1
  • 10
  • 25
1

Best to URL encode those, as they are clearly not supported in all web servers. Sometimes, even when there is a standard, not everyone follows it.

Ben Scheirman
  • 40,531
  • 21
  • 102
  • 137
1

According to the URL specification, the square brackets are not valid URL characters.

Here's the relevant snippets:

The "national" and "punctuation" characters do not appear in any productions and therefore may not appear in URLs.
national { | } | vline | [ | ] | \ | ^ | ~
punctuation < | >

17 of 26
  • 27,121
  • 13
  • 66
  • 85