93

w3fools claims that URLs can contain spaces: http://w3fools.com/#html_urlencode

Is this true? How can a URL contain an un-encoded space?

I'm under the impression the request line of an HTTP Request uses a space as a delimiter, being formatted as {the method}{space}{the path}{space}{the protocol}:

GET /index.html http/1.1

Therefore how can a URL contain a space? If it can, where did the practice of replacing spaces with + come from?

Richard JP Le Guen
  • 28,364
  • 7
  • 89
  • 119

4 Answers4

138

A URL must not contain a literal space. It must either be encoded using the percent-encoding or a different encoding that uses URL-safe characters (like application/x-www-form-urlencoded that uses + instead of %20 for spaces).

But whether the statement is right or wrong depends on the interpretation: Syntactically, a URI must not contain a literal space and it must be encoded; semantically, a %20 is not a space (obviously) but it represents a space.

Community
  • 1
  • 1
Gumbo
  • 643,351
  • 109
  • 780
  • 844
  • So... is their criticism inaccurate? – Richard JP Le Guen Mar 26 '11 at 13:46
  • 4
    @Richard JP Le Guen: That depends on how you interpret it: Syntactically, a URI must not contain a literal space and it must be encoded; semantically, a `%20` is not a space (obviously) but it represents a space. – Gumbo Mar 26 '11 at 13:50
  • Ya, that's the best interpretation I can come up with, too. – Richard JP Le Guen Mar 26 '11 at 13:53
  • And +1000000 for citing a source. This question wasn't about technology but rather about credibility and misinformation, yet it look all of 2 minutes to have 3 other unjustified, unreferenced and unproven answers which could just as easily be personal opinions. Thank you. – Richard JP Le Guen Mar 26 '11 at 13:56
21

They are indeed fools. If you look at RFC 3986 Appendix A, you will see that "space" is simply not mentioned anywhere in the grammar for defining a URL. Since it's not mentioned anywhere in the grammar, the only way to encode a space is with percent-encoding (%20).

In fact, the RFC even states that spaces are delimiters and should be ignored:

In some cases, extra whitespace (spaces, line-breaks, tabs, etc.) may have to be added to break a long URI across lines. The whitespace should be ignored when the URI is extracted.

and

For robustness, software that accepts user-typed URI should attempt to recognize and strip both delimiters and embedded whitespace.

Curiously, the use of + as an encoding for space isn't mentioned in the RFC, although it is reserved as a sub-delimeter. I suspect that its use is either just convention or covered by a different RFC (possibly HTTP).

Gabe
  • 84,912
  • 12
  • 139
  • 238
  • 6
    The character `+` is not translated into a space (or vice versa) by any part of the HTTP request process in the general case. It is, however, translated into a space when encountered as the value of a parameter in an "application/x-www-form-urlencoded" query string, and often preferred by browser software over `%20`, for the sake of brevity, when such query strings are appended to request URIs. Of course, the HTTP server may also choose to treat `+` as equivalent to space within URI paths, but that's not specified by the standard. – Mark Reed Feb 19 '13 at 04:55
  • However! The same standard, on the same page, also mentions: "Using <> angle brackets around each URI is especially recommended as a delimiting style for a reference that contains embedded whitespace." So how about that? – Brandon Kuczenski May 05 '17 at 19:10
16

Spaces are simply replaced by "%20" like :

http://www.example.com/my%20beautiful%20page

Community
  • 1
  • 1
Sandro Munda
  • 39,921
  • 24
  • 98
  • 123
4

The information there is I think partially correct:

That's not true. An URL can use spaces. Nothing defines that a space is replaced with a + sign.

As you noted, an URL can NOT use spaces. The HTTP request would get screwed over. I'm not sure where the + is defined, though %20 is standard.

orlp
  • 112,504
  • 36
  • 218
  • 315