0

Appengine is not respecting req.setCharacterEncoding('UTF-8') when reading the request body.

This is how I read the request body

StringBuilder sb = new StringBuilder();
BufferedReader reader;

req.setCharacterEncoding("UTF-8");
reader = req.getReader();

String line;
while ((line = reader.readLine()) != null) {
    sb.append(line).append('\n');
}
reader.close();

// parse body as JSON
data = new JSONObject(sb.toString());

Request with non-english character are parsed properly when running local test server (mvn appengine:devserver) but the version pushed to production does not parse non-english characters (mvn appengine:update); they are read as ?. This discrepancy is what I'm really confused about.

I also tried setting environment variables like

<env-variables>
    <env-var name="DEFAULT_ENCODING" value="UTF-8" />
</env-variables> 

in appengine-web.xml, but that doesn't change anything.

What could be causing the prod server to not parse non-english characters?

Tony Tang
  • 93
  • 1
  • 1
  • 5
  • Have you perhaps performed any call to a req.getParameter...() before the req.setCharacterEncoding call? That notoriously causes the req body to be entirely parsed and is one of the factors making setCharacterEncoding quite fragile. – Alex Martelli Nov 21 '15 at 16:51

3 Answers3

0

I don't really know why it wouldn't parse the body properly. I needed to parse the body to validate it before passing it onto my backend to do further processing. So, instead of parsing it in GAE, I relayed the body as a byte array to the backend, and let my backend handle the validation. This was the only working solution I can find.

Tony Tang
  • 93
  • 1
  • 1
  • 5
  • This seems to be a more robust solution, as setCharacterEncoding has well-known limitations (e.g see http://stackoverflow.com/questions/3278900/httpservletrequest-setcharacterencoding-seems-to-do-nothing for thorough discussion and a Tomcat-only way to hack it, the latter unfortunately not helping unless you're using Tomcat, which, on GAE, you aren't). – Alex Martelli Nov 21 '15 at 16:53
  • might be a bit too late for you Tony, but I encountered the exact same problem and have posted a solution that worked for me. – sosale151 May 21 '16 at 00:14
0

Make sure you set the content-type header on your request correctly - on the client side, as in:

requestBuilder.setHeader("Content-type", "application/json; charset=utf-8");
Andrei Volgin
  • 40,755
  • 6
  • 49
  • 58
0

I had a similar problem and this is the solution that worked for me. What I learned was that by the time the string is completely built (or appended to the string builder), it's too late because you need to specify the charset while reading the bytes and building the string.

The request.setCharacterEncoding doesn't work well in this regard, for reasons I'm unsure of.

The alternative I used for this was:

    StringBuilder stringBuilder = new StringBuilder();
    BufferedReader bufferedReader = null;
    try {
        InputStream inputStream = request.getInputStream();
        if (inputStream != null) {
            bufferedReader = new BufferedReader(new InputStreamReader(inputStream, "UTF-8"));
            char[] charBuffer = new char[128];
            int bytesRead = -1;
            while ((bytesRead = bufferedReader.read(charBuffer)) > 0) {
                stringBuilder.append(charBuffer, 0, bytesRead);
            }
        } else {
            stringBuilder.append("");
        }
    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        if (bufferedReader != null) {
            try {
                bufferedReader.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
    String body = stringBuilder.toString();

I got the input stream of bytes directly from the request and used a BufferedReader to read characters from this stream. I specified the charset here and this allowed me to build the string, while decoding in the respective charset.

sosale151
  • 360
  • 2
  • 4
  • 19