236

I am aware that a + in the query string of a URL represents a space. Is this also the case outside of the query string region? That is to say, does the following URL:

http://a.com/a+b/c

actually represent:

http://a.com/a b/c

(and thus need to be encoded if it should actually be a +), or does it in fact actually represent a+b/c?

Mad Scientist Moses
  • 1,577
  • 1
  • 12
  • 11
  • http://www.w3schools.com/tags/ref_urlencode.asp – Pratik Butani Dec 09 '13 at 05:10
  • 4
    Note that in php urldecode decodes the %2b (encoded +) to a space. To avoid this use `rawurldecode`. I say this here for reference because this is a high rated result on google search for "php url decode breaks on plus symbol". – danielson317 Mar 31 '16 at 16:57
  • 1
    Possible duplicate of [When to encode space to plus (+) or %20?](http://stackoverflow.com/questions/2678551/when-to-encode-space-to-plus-or-20) – user Apr 16 '17 at 09:25

6 Answers6

239

You can find a nice list of corresponding URL encoded characters on W3Schools.

  • + becomes %2B
  • space becomes %20
Andrew Tobilko
  • 48,120
  • 14
  • 91
  • 142
Niels R.
  • 7,260
  • 5
  • 32
  • 44
181
  • Percent encoding in the path section of a URL is expected to be decoded, but
  • any + characters in the path component is expected to be treated literally.

To be explicit: + is only a special character in the query component.

https://www.rfc-editor.org/rfc/rfc3986

Community
  • 1
  • 1
Stobor
  • 44,246
  • 6
  • 66
  • 69
  • 13
    +1 Unfortunately, many "URL coders/encoders" out there in the wild do not understand this. Eg http://www.sislands.com/coin70/week6/encoder.htm http://www.keyone.co.uk/tools-url-encoder.asp http://meyerweb.com/eric/tools/dencoder/ – leonbloy Jul 15 '10 at 16:01
  • 8
    @Stobor Did the RFC ever state that the `+` character is interpreted as a space in the query component? Or is it simply a rule "from the wild"? – Pacerier Jul 03 '12 at 23:34
  • 48
    @Pacerier and @bukzor: [RFC 1738](http://tools.ietf.org/html/rfc1738) (as modified by 2396 and 3986) defines the scheme (`http:`), authority (`//server.example.com`), and path (`/myfile/mypage.htm`) component, and does not define any special meaning for the `+` character. The HTML spec defines the query component to be mime type [application/x-www-form-urlencoded](http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1) which is defined as "replace spaces with `+` and other special characters as in RFC1738". So it's not "from the wild", but it's from an accepted (non-RFC) standard. – Stobor Jul 04 '12 at 02:52
  • try to pass some+email@example.com as a parameter, and you'll end up with what @Lennart suggested. – valk Aug 23 '12 at 08:04
  • @valk I agree - if you pass in a `+` to an encoder, it will be escaped. However, the question is not about encoding, but interpreting the encoded data. The rules for decoding the data depend on where in the URL the encoding appears. – Stobor Aug 24 '12 at 07:36
  • 2
    The .NET method `Server.UrlEncode` erroneously encodes spaces as plusses in the path portion also, violating HTTP rules. – Suncat2000 Nov 30 '15 at 16:57
  • 1
    As I read [RFC 7230](https://tools.ietf.org/html/rfc7230#section-2.7.3) which defines the HTTP/HTTPS URI scheme, it specifically references the reserved set of [RFC 3696](https://tools.ietf.org/html/rfc3986#section-2.2); therefore `+` along with other reserved delimiter characters *in the path* should be encoded in order to represent their literal value since any given HTTP server may assign them special meaning. – Lawrence Dol Sep 05 '19 at 22:16
  • 1
    @LawrenceDol - You mean (and link to) RFC3986, which states: *... applications should percent-encode ... characters in the reserved set **unless these characters are specifically allowed by the URI scheme to represent data in that component**. If a reserved character is found in a URI component and **no delimiting role is known for that character**, then it must be interpreted as ... **that character's encoding in US-ASCII**.* Further, [Section 3.3 "Path"](https://tools.ietf.org/html/rfc3986#section-3.3) specifically allows `sub-delims` (which includes `+`) anywhere in the path component. – Stobor Sep 06 '19 at 00:46
  • @Stobor : No, I mean RFC 7230 which defines the *HTTP* URI scheme and designates which characters are reserved for that scheme, and which references the (full) reserved set specified in RFC 3696, also linked. Specifically where it states, *"Characters other than those in the "reserved" set are equivalent to their percent-encoded octets: the normal form is to not encode them"*, which pretty definitively indicates that characters ***in*** the reserved set of 3696 must be percent encoded. – Lawrence Dol Sep 10 '19 at 23:00
  • @LawrenceDol [RFC7230](https://tools.ietf.org/html/rfc7230) does not refer to [RFC3696](https://tools.ietf.org/html/rfc3696) anywhere. [RFC7230 section 2.7.3](https://tools.ietf.org/html/rfc7230#section-2.7.3) references sections [2.1](https://tools.ietf.org/html/rfc3986#section-2.1), [2.2](https://tools.ietf.org/html/rfc3986#section-2.2), and [6](https://tools.ietf.org/html/rfc3986#section-6) of RFC3986. – Stobor Sep 11 '19 at 01:49
  • @LawrenceDol However, since you bring up [RFC3696](https://tools.ietf.org/html/rfc3696), it has a section called [4.2 The HTTP URL](https://tools.ietf.org/html/rfc3696#section-4.2), which explicitly states "The characters `/` `;` `?` are reserved within the path and search parts and must be encoded" – Stobor Sep 11 '19 at 01:51
  • 1
    @LawrenceDol Further "People other than those who have lawns normally don't have lawnmowers" does not imply that "people who have lawns must have lawnmowers". – Stobor Sep 11 '19 at 01:53
27

Space characters may only be encoded as "+" in one context: application/x-www-form-urlencoded key-value pairs.

The RFC-1866 (HTML 2.0 specification), paragraph 8.2.1, subparagraph 1 says: "The form field names and values are escaped: space characters are replaced by "+", and then reserved characters are escaped").

Here is an example of such a string in URL where RFC-1866 allows encoding spaces as pluses: "http://example.com/over/there?name=foo+bar". So, only after "?", can spaces be replaced by pluses (in other cases, spaces should be encoded to "%20"). This way of encoding form data is also given in later HTML specifications, for example, look for relevant paragraphs about application/x-www-form-urlencoded in HTML 4.01 Specification, and so on.

But, because it's hard to always correctly determine the context, it's the best practice to never encode spaces as "+". It's better to percent-encode all characters except "unreserved" defined in RFC-3986, p.2.3. Here is a code example that illustrates what should be encoded. It is given in Delphi (pascal) programming language, but it is very easy to understand how it works for any programmer regardless of the language possessed:

(* percent-encode all unreserved characters as defined in RFC-3986, p.2.3 *)
function UrlEncodeRfcA(const S: AnsiString): AnsiString;
const    
  HexCharArrA: array [0..15] of AnsiChar = '0123456789ABCDEF';
var
  I: Integer;
  c: AnsiChar;
begin
 // percent-encoding, see RFC-3986, p. 2.1
  Result := S;
  for I := Length(S) downto 1 do
  begin
    c := S[I];
    case c of
      'A' .. 'Z', 'a' .. 'z', // alpha
      '0' .. '9',             // digit
      '-', '.', '_', '~':;    // rest of unreserved characters as defined in the RFC-3986, p.2.3
      else
        begin
          Result[I] := '%';
          Insert('00', Result, I + 1);
          Result[I + 1] := HexCharArrA[(Byte(C) shr 4) and $F)];
          Result[I + 2] := HexCharArrA[Byte(C) and $F];
        end;
    end;
  end;
end;

function UrlEncodeRfcW(const S: UnicodeString): AnsiString;
begin
  Result := UrlEncodeRfcA(Utf8Encode(S));
end;
Zarepheth
  • 2,465
  • 2
  • 32
  • 49
Maxim Masiutin
  • 3,991
  • 4
  • 55
  • 72
0

use encodeURIComponent function to fix url, it works on Browser and node.js

res.redirect("/signin?email="+encodeURIComponent("aaa+bbb-ccc@example.com"));


> encodeURIComponent("http://a.com/a+b/c")
'http%3A%2F%2Fa.com%2Fa%2Bb%2Fc'
Baryon Lee
  • 1,157
  • 11
  • 11
  • 1
    This does not address the question. And, incorrectly encodes URLs, with a specific language (JavaScript) -- depending on the context, you probably don't want to encode where you need special (not literal) slashes (/) and colons(:) for the URL to work. – Gremio Apr 09 '18 at 17:13
  • Thanks it really helped me ! – Simon Arruti Feb 05 '19 at 10:03
-4

Try below:

<script type="text/javascript">

function resetPassword() {
   url: "submitForgotPassword.html?email="+fixEscape(Stringwith+char);
}
function fixEscape(str)
{
    return escape(str).replace( "+", "%2B" );
}
</script>
The Java Guy
  • 2,011
  • 1
  • 13
  • 12
  • 2
    I find it very odd that two people up voted this answer. It literally has nothing to do with the question. – Andrew Barber Aug 04 '14 at 05:27
  • 1
    How about for other characters * @ - _ + . / – Ravi Nov 25 '14 at 18:14
  • 1
    @AndrewBarber Why did you find it irrelevant ? + becomes %2B – The Java Guy Apr 29 '15 at 07:09
  • This is wrong for so many reasons... `escape` is deprecated, instead you should use `encodeURI` or in case of the query part `encodeURIComponent`. Also the parameter string should encode according to [w3c](http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1). – Christoph Aug 12 '15 at 14:49
-6

Thou shalt always encode URLs.

Here is how Ruby encodes your URL:

irb(main):008:0> CGI.escape "a.com/a+b"
=> "a.com%2Fa%2Bb"
Lennart Koopmann
  • 20,313
  • 4
  • 26
  • 33
  • Sorry, allow me to clarify slightly. If the user types in "http://a.com/a+b/", then this is to be interpreted to mean a%20b and not a%2Bb? – Francisco Ryan Tolmasky I Jun 17 '09 at 08:05
  • 9
    I am not sure that's right. According to RFC2396 (http://www.ietf.org/rfc/rfc2396.txt) plusses are not reserved characters in the path (segments) of the URI, only the query component. That seems to imply that they don't need to be URL encoded and thus shouldn't be interpreted as spaces in the path, only in the query. – tlrobinson Jun 17 '09 at 08:10
  • 3
    rfc 1738 however does treat pluses as spaces. It all depends on which is implemented by your encode/decode functions. for example, in php, rawurlencode follows rfc 1738 whereas urlencode follows rfc 2396. – Jonathan Fingland Jun 17 '09 at 08:19
  • 1
    See, now I have some additional confusion. In the example you gave me above, a.com%2Fa%2Bb is not what I want, it would at the very least be a.com/a%2Bb. This is an actual URL I'm dealing with, not a URL being passed as a parameter in a query string. For a little background that may help to clarify, The Mac OS X Finder is returning file system URLs to me. So if I have a file named "a?+b.txt", it returns something that looks like "file://a%3F+b.txt", NOT "file://a%3F%2B.txt". Is the finder just incorrect, or is a + before the query string actually a plus? – Francisco Ryan Tolmasky I Jun 17 '09 at 08:19
  • 2
    Jonathan: Are you sure 1738 says + is reserved? I see: safe = "$" | "-" | "_" | "." | "+" unreserved = alpha | digit | safe | extra as well as: Thus, only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL. – tlrobinson Jun 17 '09 at 08:25
  • The encoding you are using is for query parts of a URL. – Sam Stainsby Oct 29 '12 at 06:29
  • 2
    "Thou shalt always escape" needs more qualification, and the answer is irrelevant to the question anyway. – bug Apr 27 '13 at 17:48