String with encoding from MYSQL DB Query comes through as mangled in Java
Using Java, I am connecting to a MYSQL database where I query a table to glean a field from it.
The field has UTF8 encoded text, that is:
Córas Éireann
The above text is what I see when I log in to the MYSQL console and look at that row in the table. The mysql version deployed is: mysql Ver 14.14 Distrib 5.1.52, for unknown-linux-gnu (x86_64) using readline 5.1
If I use a python program to connect to the same database and query the same table and get the same row, the text looks correct, i.e. it comes as Córas Éireann
However, when I query it through Java, the text comes out as:
Córas Éireann
which I suspect is Western (ISO latin1), but I am not sure, just a guess.
I did a show table status and saw that the table I am querying has Collation of utf8_general_ci
The table I am querying does not have any adddtionaly encoding definitions, the field, name, is a varchar(512).
+--------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(512) | YES | | NULL | |
The SQL connect JAR I imported is mysql-connector-java-5.1.36, I also tried v 5.1.34 and 5.0.8 but that made no difference.
This is how I connect to the DB:
String dbStr =
String.format("jdbc:mysql://%s:%d/%s?useUnicode=yes&characterEncoding=UTF-8", LOCAL_MYSQL_HOST,
LOCAL_MYSQL_PORT, LOCAL_MYSQL_DB);
try {
cdb = DriverManager.getConnection(dbStr, LOCAL_MYSQL_USER, LOCAL_MYSQL_PASS);
Statement dbStatement = cdb.createStatement();
String query = String.format("SELECT name FROM customer WHERE id=%d",customerId);
ResultSet row = dbStatement.executeQuery(query);
if (row.first()) {
System.out.println("name is " + row.getString("name");
}
} catch (SQLException exc) {
exc.printStackTrace();
}
Please note, my original implementation did not contain
?useUnicode=yes&characterEncoding=UTF-8
but adding it did not make it better or worse. I added it as I thought this might be the culprit. I also tried latin1 instead of utf-8 to see if this made a difference but no luck, the result is exactly the same, i.e. it comes out as Córas Éireann.
I even tried things like:
byte[] rowBytes = row.getBytes("name");
String utfdecocedStr = new String(rowBytes, "UTF-8");
System.out.println(utfdecocedStr);
but the output remains as Córas Éireann
In python, I do not do any decoding/encoding, the query and connection are all basic, and I get the correct string. Is there a step I am missing that needs to be done on the DB or Java side to get this to work? In my.conf, I do not have any settings to setup any encoding config.
Python approach:
import MySQLdb
cdb = MySQLdb.connect(host=LOCAL_MYSQL_HOST,port=LOCAL_MYSQL_PORT,
user=LOCAL_MYSQL_USER,
passwd=LOCAL_MYSQL_PASS,db=LOCAL_MYSQL_DB)
ccursor = self.cdb.cursor()
query = """SELECT name FROM customer WHERE id='%d' """%(customer_id)
row = ccursor.execute(query)
if row:
customername = ccursor.fetchone()
Thank you...Amro
---------Update 20150811---------------
I ran the following commands on the database and I found the following configuration which probably explains the behavior I am seeing:
show variables like 'character%';
+--------------------------+---------------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/percona-xtradb-cluster/charsets/ |
+--------------------------+---------------------------------------------+
8 rows in set (0.00 sec)
mysql> show variables like 'collation%';
+----------------------+-------------------+
| Variable_name | Value |
+----------------------+-------------------+
| collation_connection | utf8_general_ci |
| collation_database | latin1_swedish_ci |
| collation_server | latin1_swedish_ci |
+----------------------+-------------------+
3 rows in set (0.00 sec)
By adding the following to my.cnf:
character-set-server = utf8
character-set-filesystem = utf8
Then it the tables evolve to:
mysql> show variables like 'character%';
+--------------------------+---------------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | utf8 |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/percona-xtradb-cluster/charsets/ |
+--------------------------+---------------------------------------------+
+----------------------+-------------------+
| Variable_name | Value |
+----------------------+-------------------+
| collation_connection | utf8_general_ci |
| collation_database | latin1_swedish_ci |
| collation_server | utf8_general_ci |
+----------------------+-------------------+
So I issued the command on the DB console:
ALTER DATABASE <dbname> CHARACTER SET utf8;
show variables like 'character%';
+--------------------------+---------------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | utf8 |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/percona-xtradb-cluster/charsets/ |
+--------------------------+---------------------------------------------+
8 rows in set (0.00 sec)
mysql> show variables like 'collation%';
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_general_ci |
+----------------------+-----------------+
Unfortunately this did not resolve the issue still.
Can someone let me know how I can correct this hopefully without purging the database?