1

Experiencing incorrect encoding on UTF8 text output from a tomcat:8.0 container retrieved from a mysql:5.6 container.

Connecting to the MySQL container directly and querying on the shell proves the text is stored in the database correctly.

Also UTF8 content within templates is output from the tomcat container fine.

The JDBC connector string reads: nfc.jdbc.mysql.url=jdbc:mysql://mysql:3306/mydatabase?autoReconnect=yes&useUnicode=yes&characterEncoding=UTF-8

Here's the tomcat Dockerfile I'm using:

FROM tomcat:8.0
RUN apt-get update && \
    apt-get -y install libmysql-java
RUN echo 'CLASSPATH=/usr/share/java/mysql.jar' >> /usr/local/tomcat/bin/setenv.sh 

And the MySQL Dockerfile:

FROM mysql:5.6

RUN { \
    echo '[mysqld]'; \
    echo 'character-set-server = utf8'; \
    echo 'collation-server = utf8_unicode_ci'; \
    echo '[client]'; \
    echo 'default-character-set=utf8'; \
    echo '[mysql]'; \
    echo 'default-character-set=utf8'; \
} > /etc/mysql/conf.d/charset.cnf
VOLUME /var/lib/mysql

The Tomcat run command is:

docker run \
  --rm \
  --name tomcat-server \
  --volume $(pwd)/../../webapp:/usr/local/tomcat/webapps/mywebapp \
  --volume $(pwd)/../../tomcat-users.xml:/usr/local/tomcat/conf/tomcat-users.xml:ro \
  --link mysql-server:mysql \
  --publish 8088:8080 \
  --tty \
  --interactive \
  tomcat-server

I'm using the same MySQL image to provide content to other docker container web servers (python/django) which is being pulled and output with the correct encoding.

I have no real understanding of the contents of the tomcat webapp and don't really know Java.

The developer has demonstrated the application running from a Windows server producing the correctly encoded data, however they have no understanding of Docker, and so we're currently spinning our wheels not getting anywhere!

DanH
  • 5,498
  • 4
  • 49
  • 72
  • You have an UTF-8 configured linux environment on one side and a Java developer that uses Windows on the other. The default encoding under Windows is CP-1252. Every Java component including your server is by default using system encoding when not explicitly set to UTF-8. Every property-, database-, source-file can be CP when the IDE if not configured right. So where actually breaks your encoding? Where do you enter the character into the system and where do you see a broken character? From UI? Property file? External services? A database script? – blacklabelops Jul 14 '15 at 18:12
  • The developer also advised on the JDBC connection string component `characterEncoding=UTF-8` so it they must also expect the MySQL database to provide UTF8 encoded data. The point at which content is broken is when viewing the web interface, the sidebar menu items are retrieved from the database and presented seemingly as is. The output of these sidebar menu items is incorrectly encoded in contrast to the db table whose contents output correctly via MySQL shell. – DanH Jul 15 '15 at 08:46
  • The tomcat container is not configured correctly, according to the comments in https://registry.hub.docker.com/_/tomcat/ the container is not configured to UTF-8 by default. You will have to build your own docker image extending the default tomcat container and appropriately set the Encoding. Someone wrote that RUN locale-gen en_US.UTF-8 ENV LANG en_US.UTF-8 ENV LANGUAGE en_US:en ENV LC_ALL en_US.UTF-8 should work. – blacklabelops Jul 15 '15 at 09:30
  • Thanks for the link. I did as suggested and rebuilt the Tomcat Dockerfile using Ubuntu:14.04 base with the language env vars but no change. I also used http://www.tutorialspoint.com/jsp/jsp_database_access.htm to write a basic JDBC query and output to ensure the webapp wasn't doing anything funny and it that too reproduced the problem. I tried both the tomcat8 from apache as per the tomcat8 Dockerfile, as well as tomcat7 from ubuntu repo, no change. – DanH Jul 15 '15 at 12:02
  • Well someone isn't taking the medicine and denies UTF-8. Try this cli for accessing the database and query the database: http://quuxo.com/products/jdbctool/. Start the cli with java -Dfile.encoding=UTF-8. Hope the MySQL is at least working right and the tables are all in UTF-8. – blacklabelops Jul 15 '15 at 12:58
  • Thanks, I've already been tinkering with this; `CLASSPATH=/usr/share/java/mysql.jar bin/jdbctool -u root -p pass jdbc:mysql://mysql:3306/nfcapi?characterEncoding=UTF-8` connects and queries, but the encoding is still bad. Also the JDBC string won't accept multiple parameters, anything from `&` onwards causes an error, e.g. `?characterEncoding=UTF-8&useUnicode=yes`, and visa versa. How might I run this via java binary though? Thanks again. – DanH Jul 15 '15 at 13:08
  • Well at least it's narrowed down between the database and JDBC. You sure your database, tables and columns are really in UTF-8? – blacklabelops Jul 15 '15 at 13:23
  • According to http://stackoverflow.com/a/1049958/698289 the database/table/column are all utf8 – DanH Jul 15 '15 at 13:28
  • Very good, then take squirrel sql. A nice SQL client written in java. I have good experience with it but currently not installed. http://squirrel-sql.sourceforge.net/ Then take this to access your data with squirrel sql in a proper utf-8 way: http://zaharov.info/notes/3_316_1.html – blacklabelops Jul 15 '15 at 13:33
  • Looks like SQuirreL is also returning wrongly encoded data. – DanH Jul 15 '15 at 14:20
  • Also, I tried injecting chinese characters into the table from within SQuirreL and those characters output as `????` when viewed from the MySQL client shell (the only client so far able to return original text as desired). The injected data returns to SQuirreL as it was input however. – DanH Jul 15 '15 at 14:22
  • Sorry, I do not understand that sentence. Do we now have two applications both correctly configured for UTF-8? And the encoding still breaks? – blacklabelops Jul 15 '15 at 14:29
  • Sorry, let me clarify with a single sample string `综合管理`, from the original data import from the developer, MySQL CLI client returns correct text `综合管理`, but Tomcat/SQuirreL return weird text e.g. `综åˆç®¡ç†`. However if I insert via SQuirreL the text `综合管理`, then it outputs fine in SQuirreL/Tomcat, but instead retrieves strangely in MySQL as simply `????`. – DanH Jul 15 '15 at 14:42
  • Now who is not running in UTF-8? – blacklabelops Jul 15 '15 at 14:56
  • As far as I can tell, all are configured to do so, so I'm very confused as to which one is lying. I can't see any way of telling any of the clients to report their effective setting. I'm wondering if the character replacements of `综åˆç®¡ç†` or `????` are telling in this situation? – DanH Jul 15 '15 at 15:01
  • I've also analyzed the original SQL import, it is definitely UTF8 encoded. Loading it into a text editor with ISO-8859-1 encoding shows the same `综åˆç®¡ç†` text as JDBC clients are showing. – DanH Jul 15 '15 at 15:29
  • Im lost but i have read this: utf8mb4 is MySQL's UTF-8. If you use utf8 in MySQL, you're missing out on an important portion of Unicode (the astral symbols). Switch everything, tables, database, to utf8mb4 or you may lose data. – blacklabelops Jul 15 '15 at 15:37
  • Can you please check the database again reading this? https://mathiasbynens.be/notes/mysql-utf8mb4 They say full chinese does not work in mysql utf-8 – blacklabelops Jul 15 '15 at 15:38
  • Thanks very much for that article. In fact the solution to everything seems to be setting `[mysqld] character-set-client-handshake = FALSE`. With that, even the original `utf8` is now working, no need for `utf8mb4`. For the record, `utf8mb4` WITHOUT setting the handshake var did not help. Feel free to post an answer and I'll accept :) – DanH Jul 16 '15 at 08:28
  • I assume the error is that jdbc says UTF-8 and mysql has internally utf8? – blacklabelops Jul 16 '15 at 08:42
  • Well, running `SHOW VARIABLES LIKE '%char%';` reveals that even MySQL CLI is not respecting the server charset in terms of result/client charset. But setting `character-set-client-handshake = FALSE` appears to force the client to apply the server defined charset across all other charset variables. – DanH Jul 16 '15 at 08:44

0 Answers0