10

In a Rails 4.0 app with Ruby 2.0, Turkish characters lead to the following ActiveRecord / MySQL error while trying to insert a record in the database. The problematic characters are for instance ğ and ş, but there is no problem with ü or Ç (which also seem to occur in latin charsets).

ActiveRecord::StatementInvalid (Mysql2::Error: Incorrect string value: 
                               '\xC4\x9Fu\xC5\x9F ...' for column ...

How do you prevent this error? The application and the database use UTF-8 as standard encoding. "xC4\x9F" is UTF-8 encoding for "ğ", "\xC5\x9F" is UTF-8 for "ş". Both seem to be the problematic special characters. There is no problem with German (äöü) or French (áàâ) special characters. Contrary to ISO 8859-1 or ISO 8859-15 (only ISO 8859-9 supports all Turkish characters) it should be possible to store Turkish characters in UTF-8.

The MySQL collection settings for the database are as follows. Would it be helpful to switch collection_database to a different value, such as utf8_unicode_ci?

mysql> show variables like '%collation%';
+----------------------+-------------------+
| Variable_name        | Value             |
+----------------------+-------------------+
| collation_connection | utf8_general_ci   |
| collation_database   | latin1_swedish_ci |
| collation_server     | latin1_swedish_ci |
+----------------------+-------------------+
Community
  • 1
  • 1
0x4a6f4672
  • 27,297
  • 17
  • 103
  • 140

2 Answers2

21

The reason was apparently that the database (and especially the database tables) had the wrong character set. Switching the collation of the database alone did not help. Inspection of the database tables showed that each table still had the latin1 charset, which can not store all utf8 characters:

mysql> show table status;
+----------+--------+-------------------+ ..
| Name     | Engine | Collation         | ..
+----------+--------+-------------------+ ..
| my_table | InnoDB | latin1_swedish_ci | ..

Therefore I altered the character set of the table directly:

ALTER TABLE my_table CONVERT TO CHARACTER SET utf8;

Then it finally worked, and the character set is how utf8

   mysql> show table status;
    ... +-------------------+ ..
    ... | Collation         | .. 
    ... +-------------------+ ..
    ....| utf8_general_ci   | ..
0x4a6f4672
  • 27,297
  • 17
  • 103
  • 140
0

Try applying force_encoding to encode your unicode encoded turkish string into UTF-8 like this before saving the string to database.

turkish_string = turkish_string.force_encoding("UTF-8")

Read this for more details

Community
  • 1
  • 1
Litmus
  • 10,558
  • 6
  • 29
  • 44
  • But the string apparently is encoded in UTF-8? It just can't be stored in the MySQL database. – 0x4a6f4672 Oct 11 '13 at 16:11
  • I don't think it will appear like this (byte-by-byte) `\xC4\x9Fu\xC5\x9F` if it was UTF-8, in the error message of mysql2. It means mysql2 gem did not get it as a utf8 string. – Litmus Oct 11 '13 at 16:14