When a web server gets a POST of a form, parsing it into param-value(s) pairs is quite straightforward. However, if the values contain non-English chars that have been encoded by the browser, it must know the charset used in order to decode them.
I've examined the requests sent by two posts. One was done from a page using UTF-8, and one from a page using Windows-1255. The same text was encoded differently. AFAIK, the Content-type header could contain a charset after the application/x-www-form-urlencoded
, but it wasn't (using Firefox).
In a servlet, when you use request.getParameter()
, you're supposed to get the decoded value. How does the servlet container do that? Does it always bet on UTF-8, use some heuristics, or is there some deterministic way I'm missing?