10

I have a Spring application that receives a request like http://localhost/foo?email=foo+bar@example.com. This triggers a controller that roughly looks like this:

@RestController
@RequestMapping("/foo")
public class FooController extends Controller {
    @GetMapping
    public void foo(@RequestParam("email") String email) {
       System.out.println(email)
    }
}

By the time I can access email, it's been converted to foo bar@example.com instead of the original foo+bar@example.com. According to When to encode space to plus (+) or %20? this should only happen in requests where the content is application/x-www-form-urlencoded. My request has a content type of application/json. The full MIME headers of the request look like this:

=== MimeHeaders ===
accept = application/json
content-type = application/json
user-agent = Dashman Configurator/0.0.0-dev
content-length = 0
host = localhost:8080
connection = keep-alive

Why is Spring then decoding the plus as a space? And if this is the way it should work, why isn't it encoding pluses as %2B when making requests?

I found this bug report about it: https://jira.spring.io/browse/SPR-6291 which may imply that this is fixed on version 3.0.5 and I'm using Spring > 5.0.0. It is possible that I may misinterpreting something about the bug report.

I also found this discussion about RestTemplate treatment of these values: https://jira.spring.io/browse/SPR-5516 (my client is using RestTemplate).

So, my questions are, why is Spring doing this? How can I disable it? Should I disable it or should I encode pluses on the client, even if the requests are json?

Just to clarify, I'm not using neither HTML nor JavaScript anywhere here. There's a Spring Rest Controller and the client is Spring's RestTemplate with UriTemplate or UriComponentsBuilder, neither of which encode the plus sign the way Spring decodes it.

Pablo Fernandez
  • 279,434
  • 135
  • 377
  • 622
  • I think the decoding is correct, shouldn't you be sending `%2b` if you want to send `+` as a part of the value. `+` is as such mean to be a `space` which is what you are getting here. The issue you posted is in terms of the url resolving and not param resolving – Tarun Lalwani May 18 '18 at 09:15
  • @TarunLalwani: `+` means space in `application/x-www-form-urlencoded`. I'm sending `application/json` where + doesn't have a specific meaning. – Pablo Fernandez May 18 '18 at 09:16
  • You are mixing 2 things, a `+` in the body of the request would mean a space when header has `application/x-www-form-urlencoded`. As of now what we are discussing is a url and a url doesn't need to be dependent on `content-type` at all? – Tarun Lalwani May 18 '18 at 09:18
  • Also if you want to change I believe you need to configure the filters like in https://stackoverflow.com/a/28214811/2830850 – Tarun Lalwani May 18 '18 at 09:20
  • @TarunLalwani: the URI RFC makes no mention, as far as I can see, of `+` needing to be encoded as `%2b`: https://tools.ietf.org/html/rfc3986. That is defined in HTML4: https://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1. This bug report is relevant: https://jira.spring.io/browse/SPR-6296. Again, I might be confused here and I see your point of why would the URL be content type dependent. – Pablo Fernandez May 18 '18 at 09:26
  • I just tried in Flask Python to see what happens there and same thing happens, so I don't think it is wrong from a URI perspective https://i.stack.imgur.com/fl3tx.png – Tarun Lalwani May 18 '18 at 10:27
  • Also see the URL Encoding section in https://en.wikipedia.org/wiki/Query_string – Tarun Lalwani May 18 '18 at 10:32

3 Answers3

7

Original Answer

You are mixing 2 things, a + in the body of the request would mean a space when header has application/x-www-form-urlencoded. The body or content of the request would be dependent on the headers but a request can just have a url and no headers and no body.

So the encoding of a URI cannot be controlled by any headers as such

See the URL Encoding section in https://en.wikipedia.org/wiki/Query_string

Some characters cannot be part of a URL (for example, the space) and some other characters have a special meaning in a URL: for example, the character # can be used to further specify a subsection (or fragment) of a document. In HTML forms, the character = is used to separate a name from a value. The URI generic syntax uses URL encoding to deal with this problem, while HTML forms make some additional substitutions rather than applying percent encoding for all such characters. SPACE is encoded as '+' or "%20".[10]

HTML 5 specifies the following transformation for submitting HTML forms with the "get" method to a web server.1 The following is a brief summary of the algorithm:

Characters that cannot be converted to the correct charset are replaced with HTML numeric character references[11] SPACE is encoded as '+' or '%20' Letters (A–Z and a–z), numbers (0–9) and the characters '*','-','.' and '_' are left as-is All other characters are encoded as %HH hex representation with any non-ASCII characters first encoded as UTF-8 (or other specified encoding) The octet corresponding to the tilde ("~") is permitted in query strings by RFC3986 but required to be percent-encoded in HTML forms to "%7E".

The encoding of SPACE as '+' and the selection of "as-is" characters distinguishes this encoding from RFC 3986.

And you can see the same behaviour on google.com as well from below screenshots

+ translated to space

%2B translated to +

Also you can see the same behaviour in other frameworks as well. Below is an example of Python Flask

Flask Demo

So what you are seeing is correct, you are just comparing it with a document which refers to the body content of a request and not the URL

Edit-1: 22nd May

After debugging it seems the decoding doesn't even happen in Spring. I happens in package org.apache.tomcat.util.buf; and the UDecoder class

/**
 * URLDecode, will modify the source.
 * @param mb The URL encoded bytes
 * @param query <code>true</code> if this is a query string
 * @throws IOException Invalid %xx URL encoding
 */
public void convert( ByteChunk mb, boolean query )
    throws IOException
{
    int start=mb.getOffset();

And below is where the conversion stuff actually happens

    if( buff[ j ] == '+' && query) {
        buff[idx]= (byte)' ' ;
    } else if( buff[ j ] != '%' ) {

This means that it is an embedded tomcat server which does this translation and spring doesn't even participate in this. There is no config to change this behaviour as seen in the class code. So you have to live with it

msp
  • 3,272
  • 7
  • 37
  • 49
Tarun Lalwani
  • 142,312
  • 9
  • 204
  • 265
  • What do you mean by comparing it to a document which refers to the body content of a request? – Pablo Fernandez May 22 '18 at 08:49
  • @pupeno, I meant to say that in response to `According to When to encode space to plus (+) or %20? this should only happen in requests where the content is application/x-www-form-urlencoded` in the question – Tarun Lalwani May 22 '18 at 08:52
  • I don't think Wikipedia is the authoritative answer in this matter, I'd prefer to refer to RFCs. But, in that snippet you copied from wikipedia, it says that HTML 5 makes those transformation. I'm not touching HTML anywhere. – Pablo Fernandez May 22 '18 at 08:52
  • For example, neither `UriTemplate` nor `UriComponentsBuilder` escape plus signs: https://stackoverflow.com/questions/50432395/whats-the-proper-way-to-escape-url-variables-with-springs-resttemplate – Pablo Fernandez May 22 '18 at 08:53
  • Let me see if I can dig some RFC related to same – Tarun Lalwani May 22 '18 at 09:07
  • I think this bug report sheds some light onto this issue: https://jira.spring.io/browse/SPR-11047 – Pablo Fernandez May 22 '18 at 09:15
  • If you see https://tools.ietf.org/html/rfc3986#section-2.2. There are two parts. It says `gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"` and `sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="` And then it says `URI producing applications should percent-encode data octets that correspond to characters in the reserved set unless these characters are specifically allowed by the URI scheme to represent data in that component.`. So the `RFC` does list `+` as reserved sub delimiter. That's the sense I am getting reading the RFC section. Your thoughts? – Tarun Lalwani May 22 '18 at 09:21
  • Yes, I see that. In essence, that says that the plus sign shouldn't be in the URL, that it should have been encoded, but it doesn't say that a plus sign is the encoded of a space. It does sound like Spring should reject those URLs, the ones with plus signs, as invalid. The problem I'm having is that Spring is treating the plus signs differently whether it's a client or a server, it's self-inconsistent. – Pablo Fernandez May 22 '18 at 09:30
  • A similar issue on client side was this https://stackoverflow.com/questions/48906034/generated-swagger-rest-client-does-not-handle-character-correctly-for-query-pa/49005785#49005785. I know its not consistent, but there is no clear definition of how the behaviour should be for decoding as you pointed out – Tarun Lalwani May 22 '18 at 09:44
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/171530/discussion-between-tarun-lalwani-and-pupeno). – Tarun Lalwani May 22 '18 at 10:02
3

SPR-6291 fixed this problem in v3.0.5 but this remains unresolved in some other cases like SPR-11047 is still unresolved. While SPR-6291's priority was Major, SPR-11047's priority is Minor.

SPR-11047

I faced this problem when I was working on REST API in old Spring last year. There are multiple ways we can get data in Spring controller. So two of them are via @RequestParam or @PathVariable annotation

As others mentioned I think its spring's internal issue and does not specifically belong to URL encoding because I was sending data over POST request but it is somewhat encoding problem. But I also agree with others as now it remains problematic only in URL.

So there are two solutions I know:

  1. You can use @PathVariable instead of @RequestParam because as of SPR-6291 this plus sign issue is fixed in @PathVariable and still remains open for @RequestParam as SPR-11047

  2. My version of spring was not even accepting plus sign via @PathVariable annotation, so this is how I overcome the problem (I don't remember it step by step but it will give you hint).

In your case you can get the fields via JS and escape the plus sign before sending a request. Something like this:

var email = document.getElementById("emailField").value;
email = email.replace('+', '%2B');
UsamaAmjad
  • 4,175
  • 3
  • 28
  • 35
  • Wow... so, Spring's `UriTemplate` and `UriComponentsBuilder` is not consistent with Spring's parsing of URIs and even the Spring parsing of URIs is inconsistent with itself using different rules for the paths and the request parameters. – Pablo Fernandez May 22 '18 at 09:05
  • @usama For JS the OP should use [encodeURI()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURI). – edixon May 22 '18 at 09:50
  • @pupeno I also found this `uri` problem in `.Net` don't know if it is still buggy. It seems like a common problem due to the plus sign decoding in uri – UsamaAmjad May 22 '18 at 09:58
  • By the way, the last bit doesn't apply to me. I'm not using JavaScript anywhere. – Pablo Fernandez May 22 '18 at 10:09
  • Since you are using spring >5.0 `@PathVariable` should work for you. – UsamaAmjad May 22 '18 at 10:14
1

If you have this request:

http://localhost/foo?email=foo+bar@example.com

then the original is foo bar@example.com. If you say the original should be foo+bar@example.com then the request should be:

http://localhost/foo?email=foo%2Bbar@example.com

So Spring is working as supposed to. Maybe on client you should check if the URI is properly encoded. The client-side URL encoding is responsible for building a correct HTTP request.

See encodeURI() if you generate the request in JavaScript or uriToString() if you generate the request in Spring.

Build your request string (the part after ?), without any encoding, with unencoded values like foo+bar@email.com, and only in the end, before actually using it in GET, encode all of it with whatever is available on the client platform. If you want to use POST then you should encode it according to the MIME type of your choice.

edixon
  • 991
  • 6
  • 16
  • Spring seems to parse plus signs as spaces, but not encode pluses as spaces: https://stackoverflow.com/questions/50432395/whats-the-proper-way-to-escape-url-variables-with-springs-resttemplate – Pablo Fernandez May 22 '18 at 08:56
  • @pupeno The OP of that question has the same issue as yourself. He uses `.fromUriString()` that correctly decodes `+` into a space. However, he does not uses `.toUriString()` to get the correctly encoded request. – edixon May 22 '18 at 09:41
  • By the way, I'm not using JavaScript anywhere. – Pablo Fernandez May 22 '18 at 10:03
  • How about sending the request in its body (form-data)?? How to encode the `+` sight in this case?? – Nguyễn Đức Tâm May 25 '22 at 06:26
  • The '+' is only relevant for URIs because it is the encoding for ' ' (space char), which is not allowed in the URI. In bodies, both ' ' and '+' can be left as they are. In form-data the only special chars are '&' (that separates the tuples) and '=' (that separates key from value). See https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods/POST – edixon May 26 '22 at 11:10