1

I'm upgrading an application from rails 2.3 to rails 5. One problem that we have is with encodings on db, we are using mysql.

On the rails 2.3 application, if you query the db for our field you get the valid symbol, for example:

If you look directly on the db:

€

Checking the hex representation

select HEX(txt) from table;
+----------------+
| HEX(txt)       |
+----------------+
| C3A2E2809AC2AC |
+----------------+
1 row in set (0.00 sec)

If I save exactly the same char on the rails 5 version o the app, i got the correct value on the db when query the db directly.

For the lengh of the hex I thought it was utf-16 but not:

SELECT CHAR(0xC3A2E2809AC2AC USING utf16);
+-----------------------------------+
| CHAR(0xC3A2E2809AC2AC USING utf16) |
+-----------------------------------+
| 肚슬                              |
+-----------------------------------+
1 row in set (0.00 sec)

Now, if I know that 0xC3A2E2809AC2AC represent a €, its possible to know in what charset is that representation accurate?

I think that the mysql adapter mysql (2.8.1) is doing some conversion, but I'm not able to find any documentation about this.

the field collation is utf8_general_ci and the db character set is utf8.

Arnold Roa
  • 7,335
  • 5
  • 50
  • 69

2 Answers2

1

No, that is not the proper encoding for Euro Sign, at least not directly.

Treated as utf8, C3A2 E2809A C2AC (spacing added) is €. But undo the "double encoding" (that is, convert through latin1 twice), you do get :

CONVERT(BINARY(CONVERT(CONVERT(UNHEX('C3A2E2809AC2AC')
                       USING utf8mb4)
               USING latin1))
USING utf8mb4) --> '€'

(In this case, utf8 and utf8mb4 will produce the same results.)

For more discussion, search for "double" in Trouble... and Here . Both give possible fixes for both the system and the data.

Original question

Superficially, your encoding is utf8. But, because of the "double encoding", that conclusion is misleading. See the section "Diagnosing CHARSET issues" in the second link, above.

Community
  • 1
  • 1
Rick James
  • 135,179
  • 13
  • 127
  • 222
0

To convert that into utf 8, export and import the table, like this

mysqldump -u db_user -p --opt --default-character-set=latin1 --skip-set-charset db_name db_table > some_file.sql

observe --skip-set-charset option to force it not to put any charset in dump.

then I import it with

mysql -u db_user -p --default-character-set=utf8 db_name < some_file.sql
Eduard
  • 3,536
  • 1
  • 20
  • 27
  • No, the latin1 encoding is simply `80`; the utf8 encoding is `E282AC`; what you have is the "double-encoding". – Rick James Jan 20 '17 at 02:13