0

I encountered some problems about character encoding recently. When I tried to fire a HTTP GET request, which contains some non-ascii characters in the query string, I found that the server could not decode the parameters correctly.

My current solution is to configure the server.xml of tomcat, adding the attribute URIEncoding="utf-8" to the <Connector> element. Well, it solves the problem. But my question is: What if the URL is not encoded with utf-8?(Like some ANSI encoding, you can do that, right?)

Is there a way for the server to figure out what encoding the URL is using other than just setting a fixed value?

PS: I know some basics of character encoding and the differences between UTF-8 and Unicode.

du369
  • 821
  • 8
  • 22
  • [This SO question](http://stackoverflow.com/questions/1549213/whats-the-correct-encoding-of-http-get-request-strings) deals with the issue. The standards demand that the url is represented in 8859-1 - however, applying percent-encoding to utf-8 octet sequences this is not a restriction. The consensus seems to be that is the way to go. – collapsar Apr 23 '16 at 20:19

1 Answers1

2

The server dictates the charset(s) it will accept for (percent-encoded) URLs to its resources. If the client sends a URL in the wrong charset, it will not work correctly. There is no protocol to allow the server to advertise its desired charset(s), though. So it is kind of a catch-22. If the URL originates from an HTML page, use the charset of the HTML. Otherwise you just have to guess, and you will probably guess wrong, if the server does not accept UTF-8.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770