WebApp on multiple servers with different encodings (UTF-8 vs CP-1252)

Question

The sourcecode of my webapp is written in UTF-8 encoded files. It is deployed to different Tomcat servers which either use UTF-8 encoding or CP-1252. While everything is fine on the Tomcats with UTF-8 encoding, the CP-1252 Tomcats have problems with special characters like Umlauts in strings (e.g. for database queries).

I have no access to the Tomcat configurations, so I can't just switch all to UTF-8.
I could replace all Umlauts with unicode escapes, but that is error prone, since it can be forgotten.
Same applies to recoding each string with new String(value.getBytes(), StandardCharsets.UTF_8)

So is there any solution to this problem? Searching for encoding and Tomcat leads always to "yeah, edit server.xml of your Tomcat and change encoding to UTF-8" which doesn't help me.

Wait a minute. If you're deploying an app, the internal Strings won't be affected by the platform encoding. How can you claim that database queries are affected? Also your "recoding" solution can't work. Lastly, it's 2018, why are you using Cp1252? — Kayaman, Mar 08 '18 at 15:24
The JVM also runs with the crappy Cp1252 encoding, thus a string like "täst" in the sourcecode (which is in UTF-8) is going to be messed up. I found a bug where only one query was affected: The one with an Umlaut. Also when the string is written to console it will look messed up. The recoding solution does work, it fixes console output and the database query ;) I am not using Cp1252, but some of the Tomcat servers I have no control of. I use UTF-8 for my sourcecode. — Holger Rattenscheid, Mar 08 '18 at 15:57
Are you compiling it separately on every server, or why does it affect the source code? Your recoding "solution" works by chance, not by design. The platform encoding affects things only when you do things like write data using the default encoding (instead of explicitly using UTF-8 for example). — Kayaman, Mar 08 '18 at 16:05
When you say "source code", do you mean the source of JSP files? Do you have access to the configuration of the *webapp*, even if you don't have access to the configuration of the Tomcat server itself? — Christopher Schultz, Mar 08 '18 at 16:12
Ah, that would explain. Old style raw JSPs. Then I'd recommend changing the Tomcat encodings. If you don't have the authority to do it yourself, find someone who has. — Kayaman, Mar 08 '18 at 16:18
At Kayaman: It is compiled on my machine and the WAR is deployed to each Tomcat. Maybe I explain my test setup: Windows machine with Eclipse with a workspace that is configured to UTF-8 (-> UTF-8 source). I installed a Tomcat on the same Windows machine which runs with the default Windows encoding. I create the WAR in Eclipse and copy it to Tomcat. Result: things are messed up as described. If I copy the same WAR to my Linux servers Tomcat, everything is fine. At Christoph: I mean Java sourcecode (for a REST-Service) and yes I have access to the web.xml. — Holger Rattenscheid, Mar 08 '18 at 16:27
Maybe you should show some relevant pieces of code. If you've compiled a class with something like `String foo = "Leberkäse";`, after the compilation nothing can change the `ä` in there. You're also talking about the source code constantly, when that's not relevant (unless you really are compiling the sources on every environment). — Kayaman, Mar 08 '18 at 16:39
Yes I talk constantly about the source code, because this is what I observe after compiling it ONCE and then deploying it on different tomcats. Here is a code example: "System.out.println("Leberkäse");" and this is what the output looks like on the Windows Tomcat: "LeberkÃ¤se" — Holger Rattenscheid, Mar 08 '18 at 16:52
Yeah, and that has nothing to do with the source code. It looks like everything is working properly. If you set your terminal encoding correctly, you'll see an `ä`. — Kayaman, Mar 08 '18 at 17:02
See this related question: https://stackoverflow.com/questions/48813437/why-are-results-of-path-tostring-failing-to-show-all-characters-on-linux-but-o/48825543#48825543 — Kayaman, Mar 08 '18 at 17:04
But the same terminal shows "Leberkäse" if I run the same code outside of Tomcat... So the encoding of the terminal is fine for UTF-8. Also there is still the SQL problem with "Leberkäse": A query with "like 'Leberkäse'" works on the Linux Tomcat but breaks on the Windows Tomcat with the same WAR (all Tomcat instances query the same mysql instance which is configured to utf-8). And hence I can't see the relation to the other stackoverflow question. I don't read any external stuff, it is just in the (I don't want to say it...) source code... — Holger Rattenscheid, Mar 08 '18 at 18:51
Other way around: If I switch the source code encoding to native windows (so cp1252) and place the WAR inside of the Windows Tomcat: Console output works and the SQL query works. But this time this same WAR breaks the SQL query on the Linux Tomcat. — Holger Rattenscheid, Mar 08 '18 at 18:52

WebApp on multiple servers with different encodings (UTF-8 vs CP-1252)

0 Answers0