155

Is a URI (specifically an HTTP URL) allowed to contain one or more space characters? If a URL must be encoded, is + just a commonly followed convention, or a legitimate alternative?

In particular, can someone point to an RFC that indicates that a URL with a space must be encoded?

Motivation for question: While beta-testing a web site, I noted that some URLs were constructed with spaces in them. Firefox seemed to do the right thing, which surprised me! But I wanted to be able to point the developers to an RFC so that they would feel the need to fix those URLs.

DavidRR
  • 18,291
  • 25
  • 109
  • 191
Joe Casadonte
  • 15,888
  • 11
  • 45
  • 57

10 Answers10

119

As per RFC 1738:

Unsafe:

Characters can be unsafe for a number of reasons. The space character is unsafe because significant spaces may disappear and insignificant spaces may be introduced when URLs are transcribed or typeset or subjected to the treatment of word-processing programs. The characters "<" and ">" are unsafe because they are used as the delimiters around URLs in free text; the quote mark (""") is used to delimit URLs in some systems. The character "#" is unsafe and should always be encoded because it is used in World Wide Web and in other systems to delimit a URL from a fragment/anchor identifier that might follow it. The character "%" is unsafe because it is used for encodings of other characters. Other characters are unsafe because gateways and other transport agents are known to sometimes modify such characters. These characters are "{", "}", "|", "\", "^", "~", "[", "]", and "`".

All unsafe characters must always be encoded within a URL. For example, the character "#" must be encoded within URLs even in systems that do not normally deal with fragment or anchor identifiers, so that if the URL is copied into another system that does use them, it will not be necessary to change the URL encoding.

nhahtdh
  • 55,989
  • 15
  • 126
  • 162
Marc Novakowski
  • 44,628
  • 11
  • 58
  • 63
  • 3
    1738 has been superceeded by 2396. http://www.ietf.org/rfc/rfc2396.txt That is the current Uri specification. It does not matter in this case though. – Steve Severance Jan 31 '09 at 19:14
  • 47
    And 2396 has been superseded by 3986. Many people get this wrong, as RFCs are immutable, and thus do not tell the reader that they have been obsoleted. Hint: use http://tools.ietf.org/html/rfcnnnn, such as http://tools.ietf.org/html/rfc2396 instead, it displays the missing metadata on top. – Julian Reschke Feb 01 '09 at 14:41
50

Shorter answer: no, you must encode a space; it is correct to encode a space as +, but only in the query string; in the path you must use %20.

Peter Hilton
  • 17,211
  • 6
  • 50
  • 75
  • 1
    Hi, I am confused too, sometime I saw the book use "+" but sometime "%20", can you show some example for this? When user submit the form, how the form encode the space ? with which character? – Sam YC Nov 07 '12 at 06:29
  • 1
    See [this answer](http://stackoverflow.com/a/1211256/1497596) for additional detail. – DavidRR Sep 17 '14 at 12:45
  • what about fragment/hash part? How spaces should be encoded there? – humkins Dec 19 '14 at 13:29
  • @gumkins: the fragment (# and after) is not sent to the server. In practice, you can use %20 or + anywhere to encode a space. – Julien Sep 12 '15 at 20:39
46

Why does it have to be encoded? A request looks like this:

GET /url HTTP/1.1
(Ignoring headers)

There are 3 fields separated by a white space. If you put a space in your url:

GET /url end_url HTTP/1.1

You know have 4 fields, the HTTP server will tell you it is an invalid request.

GET /url%20end_url HTTP/1.1

3 fields => valid

Note: in the query string (after ?), a space is usually encoded as a +

GET /url?var=foo+bar HTTP/1.1 

rather than

GET /url?var=foo%20bar HTTP/1.1 
Julien
  • 5,729
  • 4
  • 37
  • 60
  • 2
    What if var really was "foo+bar" and not "foo bar"? – Ivo3185 Sep 11 '15 at 15:40
  • 2
    I would argue that's a requirement of the transport layer, not of the URI specification itself. GET is clearly a property of the http: specification, not the URL specification. Similarly you could argue quotes in urls "must" be encoded because otherwise web pages would break. But that's a property of HTML formatting limitations, ( which there are other strategies against ), not a property of the URL specification. – Kent Fredric Jan 23 '16 at 03:00
  • https://www.ietf.org/rfc/rfc1738.txt - Unsafe characters including space) should be encoded – Julien Jan 25 '16 at 05:23
  • 1
    @KentFredric This is more likely the *presentation* layer, not the *transport* layer. As *Julien* (almost) writes, the original URI spec ([RFC 1630](https://tools.ietf.org/html/rfc1630)) contains this restriction, so it's a part of the URI specification itself regardless of your personal feelings. Since the URI spec was written *after* the HTTP drafts, it's very possible that URIs were designed with HTTP in mind, including the prohibition against the use of spaces, but it doesn't really matter, does it? The truth is that the spec is what the spec is. – Christopher Schultz Apr 27 '18 at 21:31
10

URLs are defined in RFC 3986, though other RFCs are relevant as well but RFC 1738 is obsolete.

They may not have spaces in them, along with many other characters. Since those forbidden characters often need to be represented somehow, there is a scheme for encoding them into a URL by translating them to their ASCII hexadecimal equivalent with a "%" prefix.

Most programming languages/platforms provide functions for encoding and decoding URLs, though they may not properly adhere to the RFC standards. For example, I know that PHP does not.

Rob Williams
  • 7,919
  • 1
  • 35
  • 42
8

URL can have an Space Character in them and they will be displayed as %20 in most of the browsers, but browser encoding rules change quite often and we cannot depend on how a browser will display the URL.

So Instead you can replace the Space Character in the URL with any character that you think shall make the URL More readable and ' Pretty ' ;) ..... O so general characters that are preferred are "-","_","+" .... but these aren't the compulsions so u can use any of the character that is not supposed to be in the URL Already.

Please avoid the %,&,},{,],[,/,>,< as the URL Space Character Replacement as they can pull up an error on certain browsers and Platforms.

As you can see the Stak overflow itself uses the '-' character as Space(%20) replacement.

Have an Happy questioning.

7

Yes, the space is usually encoded to "%20" though. Any parameters that pass to a URL should be encoded, simply for safety reasons.

user54650
  • 4,388
  • 2
  • 24
  • 27
6

Urls should not have spaces in them. If you need to address one that does, use its encoded value of %20

Chris Ballance
  • 33,810
  • 26
  • 104
  • 151
5

Can someone point to an RFC indicating that a URL with a space must be encoded?

URIs, and thus URLs, are defined in RFC 3986.

If you look at the grammar defined over there you will eventually note that a space character never can be part of a syntactically legal URL, thus the term "URL with a space" is a contradiction in itself.

Julian Reschke
  • 40,156
  • 8
  • 95
  • 98
4

To answer your question. I would say it's fairly common for applications to replace spaces in values that will be used in URLs. The reason for this is ussually to avoid the more difficult to read percent (URI) encoding that occurs.

Check out this wikipedia article about Percent-encoding.

Funk Forty Niner
  • 74,450
  • 15
  • 68
  • 141
Eric Schoonover
  • 47,184
  • 49
  • 157
  • 202
0

Firefox 3 will display %20s in URLs as spaces in the address bar.

Sophie Alpert
  • 139,698
  • 36
  • 220
  • 238
  • This is not a proper answer to pretty straightforward question: *`"Is a URL allowed to contain a space?"`*. Rather a comment. – Roko C. Buljan Jul 22 '19 at 22:42