I have Spring application that is experiencing some encoding issues. When the client submits "São Paulo", I see it in the request header as:
=============>>> url is: /users/1825220/activity=update_fields&hometown=S%C3%A3o%20Paulo&usrId=1234 (PUT)
That is generated by dumping the request in the log as it comes in.
logger.info("\n=============>>> url is: " + request.getRequestURI() + "/" + request.getQueryString() + " (" + request.getMethod() + ")");
The request is then passed to the method:
@RequestMapping(value = "/users/{id}", method = RequestMethod.PUT)
public @ResponseBody
OperationResponse updateUser(HttpServletRequest request,
@PathVariable("id") Integer id,
@RequestParam(value = "hometown", required = false) String homeTown)
throws NoSuchAlgorithmException, UnsupportedEncodingException {
When I dump the value:
logger.debug("HOMETOWN=" + homeTown);
I get: HOMETOWN=São Paulo
I am somewhat familiar with the basics of encoding and everything looks to be UTF-8, but evidently I do not know enough to figure this out. I have seen several topics on this, even with the same data, but I have not found anything that addresses it exactly that works.
I see that the values are correct. e.g.: The ã (in São) has these hex values. http://www.utf8-chartable.de/
U+00A3 £ c2 a3 POUND SIGN
U+00C3 Ã c3 83 LATIN CAPITAL LETTER A WITH TILDE
U+00E3 ã c3 a3 LATIN SMALL LETTER A WITH TILDE
The incoming values are the same from both a native iOS app and a website and via curl. For some reason, the ã (U+00E3) is being broken out into 4 bytes (%C3%A3) instead of 2 (%E3). I just can't figure out where the disconnect is.
What I need to do is preferably figure out what to change in the configuration some where rather than have to add code changes everywhere the data comes in.