1

I have a MySQL database which is as far as I can tell utf8 through and through. It's a database originally used solely by a Ruby on Rails application which has no issue writing and reading the utf8 characters.

However, when attempting to read via a c# app using Dapper I'm seeing a lot of bad characters, for example:

Expected: FELIZ AÑO Actual: FELIZ AÑO

My connection string looks like this:

Server=;Database=;Uid=;Pwd=;Port=;SslMode=;charset=utf8;

I've tried several combinations of charset and utf8 capitalisation but none have worked so far. I read in one thread that the charset in the connection string only affects the SQL query language so if that's correct that'll be why its not helping!

Is there anything else I'm missing potentially? or anything else I can do to be able to read the characters correctly?

Update: Some info from MySQL:

mysql> SELECT TABLE_NAME, COLUMN_NAME, CHARACTER_SET_NAME, COLLATION_NAME, COLUMN_TYPE
    -> FROM information_schema.columns
    -> WHERE TABLE_NAME = 'pages'
    -> AND COLUMN_NAME = 'title';

| TABLE_NAME | COLUMN_NAME | CHARACTER_SET_NAME | COLLATION_NAME  | COLUMN_TYPE  |
| pages      | title       | utf8               | utf8_general_ci | varchar(255) |

Update 2: More info, it seems my character sets are a bit messed up..

mysql> show variables like "character_set_%";
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | latin1                     |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

mysql> select collation_name from information_schema.columns where table_name = 'pages' and column_name = 'title';
+-----------------+
| collation_name  |
+-----------------+
| utf8_general_ci |
+-----------------+

mysql> select title from pages where id = 3660;
+--------------------------------+
| title                          |
+--------------------------------+
| FELIZ AÑO  |
+--------------------------------+

mysql> set character set 'latin1';

mysql> select title from pages where id = 3660;
+-----------------------------+
| title                       |
+-----------------------------+
| FELIZ AÑO  |
+-----------------------------+

So, database is latin1, default connection is utf8, default client is utf8, column is utf8.

If I query the column directly in mysql, it comes out wrong until I set the character set to latin1.

Doing the exact same thing in C# with Execute to set the character set and ExecuteReader to read the column is still producing the bad characters. Trying to work out the difference.

Update 3 - Screenshot for @BradleyGrainger, this is from SequelPro

enter image description here

matthewrk
  • 677
  • 10
  • 22
  • 2
    Dapper just asks ADO.NET for the value from the data-reader, which is MySQL-specific - it doesn't do any of the interpreting, so: if the specific provider's data-reader is unable to interpret the data, that will impact *any* tool that sits on top of ADO.NET – Marc Gravell Feb 12 '19 at 15:42
  • @MarcGravell Ok understood, I'm using the official MySql.Data.MySqlClient – matthewrk Feb 12 '19 at 16:01
  • 2
    aye; I think while you'e trying to fix the configuration, you may want to just test with a basic `DbCommand` that uses `ExecuteScalar` - until that gets the value right, all bets are off; sadly, I don't know a lot about MySQL configuration (either at the column level, or in the connection-string), so I can't advise directly – Marc Gravell Feb 12 '19 at 16:03
  • 1
    Can you paste a screenshot from MySQL Workbench that shows that the values in the table are encoded correctly? It's possible that your database _actually_ contains `FELIZ AÑO` (encoded in UTF-8) and Ruby on Rails is _double_ decoding it to get the desired string. – Bradley Grainger Feb 12 '19 at 18:42
  • @BradleyGrainger Yeah I had wondered if Rails was perhaps doing something unusual, I've updated the post with the screenshot requested. – matthewrk Feb 12 '19 at 23:11
  • 1
    You are suffering from Mojibake, or possibly "double encoding". – Rick James Feb 13 '19 at 05:28
  • 1
    Please provide `select HEX(title) from pages where id = 3660; – Rick James Feb 13 '19 at 05:29
  • @RickJames: 46454C495A2041C383E280984F – matthewrk Feb 13 '19 at 11:13
  • This works: SELECT CONVERT(CAST(CONVERT(title USING latin1) AS BINARY) USING utf8) title FROM pages WHERE id = 3660; So its latin1 encoded in a utf8 table? – matthewrk Feb 13 '19 at 16:06

1 Answers1

2

OK, it's just Mojibake.

HEX:  46 45 4C 49 5A 20 41 C383 E28098 4F
      F  E  L  I  Z (sp) A  Ã      ‘   O
Mojibaked:  FELIZ AÑO
Should be:  FELIZ AÑO

The instructions for preventing it is in "Mojibake" of Trouble with UTF-8 characters; what I see is not what I stored

After preventing it, one way to "fix" the data would involve an UPDATE to change the column contents thus:

col = CONVERT(BINARY(CONVERT(col USING latin1)) USING utf8mb4)

You can see that via:

SELECT CONVERT(BINARY(CONVERT('FELIZ AÑO' USING latin1)) USING utf8mb4); -- FELIZ AÑO

(Please experiment in a test environment, not production.)

Rick James
  • 135,179
  • 13
  • 127
  • 222