0

I have set my MariaDB to Default Character Set UTF-8. According to Blue Box Blog Character Set Hell, setting that will save me the problem from unrecognized accented character like 'é'.

However, I can't get some of the results with

SELECT title FROM tablename WHERE title = 'sometext'

where sometext need not necessary have accented characters.

Is there a unique approach to handle utf8 character set? Because it looks like it doesn't recognize string in java.

In Java Code

String query = "SELECT title FROM tablename WHERE title=?";

PreparedStatement pState = conn.getPreparedStatement(query);
ResultSet result;
String textResult = null;

pState.setString(1, "sométéxt");
result = pState.executeQuery();

if(result.next())
  textResult= result.getString(1);

System.out.println(textResult);

Output:

null

Other approach

1)

String query = "SELECT title FROM tablename WHERE CONVERT(title USING utf8)=?";

2)

String query = "SELECT title FROM tablename WHERE title=?";

byte[] req = "sométéxt".getBytes("UTF-8");
pState.setBytes(1,req);

output

null

Problem faced before Char Set=utf8

Prior to the Character Set=utf8, my first setup of the database was with the default Character Set, latin1_swedish_ci. With my field 'tablename' datatype=VARBINARY(), the unrecognized accented can be solved with

SELECT * FROM tablename WHERE CONVERT(title USING utf8) = 'sometext'

However, because my database has more than 10 million rows. Using CONVERT function on the whole 'title' field makes the SQL query and Java programme run very slowly.

Hoejhl
  • 38
  • 6
  • I am very new to Java. If there is some parts in my question that is not clear, please forgive me. :) Do help me out by commenting or editing my question. Thanks! – Hoejhl Jun 21 '13 at 02:03

2 Answers2

2

Your choice of character set determines how your textual information is stored. But it's your choice of collation that determines how your WHERE clauses match that information.

Try

  SELECT title FROM tablename WHERE title=? COLLATE utf8_general_ci

If that works -- if you start getting the rows you want -- alter the title column in your tablename table as follows

   ALTER TABLE tablename1 
   CHANGE title title VARCHAR(xxxx) CHARACTER SET utf8 COLLATE utf8_general_ci NULL 

You should actually use the existing definition of the title column and just throw in the COLLATE utf8_general_ci clause. Once you've made this change, you may need to rebuild your indexes. But at any rate your matching will start to work correctly

If you're working with a particular language you may want to adopt its own collation. For example, in the general collation N and Ñ are treated as the same letter, but in Spanish they are not.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • When I add utf8_unicode_ci in my code, error show COLLATION 'utf8_unicode_ci' is not valid for "CHARACTER SET 'latin1'" I check my CharacterSet according to this [stackoverflow](http://stackoverflow.com/questions/1049728/how-do-i-see-what-character-set-a-database-table-column-is-in-mysql), schemaCharSet=utf8, tableCharSet=binary & columnCharSet=null. I don't know what did I did wrong? – Hoejhl Jun 21 '13 at 05:52
  • 1
    This error means the character set for this column of your table is still `latin1`. You haven't succeeded in actually changing it in your database yet. Don't give up: sometimes it takes some fiddling around to get this right. Use `SHOW CREATE TABLE tablename` to get your MySQL server to tell you how the table is actually set up. – O. Jones Jun 21 '13 at 11:59
  • Since my table's CHARSET is binary, if let say, my data initially was stored with the default CHARSET (because I didn't specified the column's CHARSET and MySQL default is latin1), converting the column to utf8 following the method in this [doc](http://dev.mysql.com/doc/refman/5.0/en/charset-conversion.html) from line23, will MySQL able to convert the data properly since binary datatype value has to be converted with the CHARSET which I converted the column to [[1](http://dev.mysql.com/doc/refman/5.0/en/charset-conversion.html)] _first pointer_? – Hoejhl Jun 24 '13 at 02:00
  • Columns have character sets. Tables have default character sets. Is your table's character set actually `binary`? At any rate, what is the character set of the column you are concerned with? Sorry to be pedantic, but there is no other way to get this right. Java String instances have a native Unicode encoding and the MySQL interface software (the JDBC driver) is good, therefore when you read your information from MySQL into Java it should be what you expect. You need to do some experimenting to figure these things out. – O. Jones Jun 24 '13 at 11:32
0

Try to use this when writing to db if everything else is set to utf-8 (will work 100%):

public String to_utf8(String fieldvalue) throws UnsupportedEncodingException{

        String fieldvalue_utf8 = new String(fieldvalue.getBytes("ISO-8859-1"), "UTF-8");
        return fieldvalue_utf8;
}
DarioBB
  • 663
  • 2
  • 8
  • 29