1

Test SELECT:

MySQL [chuangwai]> select ar_detail from items limit 1\G;
*************************** 1. row ***************************
ar_detail: {"طراز": "فساتين قفطان", "المواد": "الشيفون"}

and you can see the Arabic characters displayed correctly.

Then I check encoding:

MySQL [chuangwai]> select * from information_schema.SCHEMATA\G;
*************************** 2. row ***************************
              CATALOG_NAME: def
               SCHEMA_NAME: chuangwai
DEFAULT_CHARACTER_SET_NAME: latin1
    DEFAULT_COLLATION_NAME: latin1_swedish_ci
                  SQL_PATH: NULL

In another SO post, BalusC said:

If you're trying to store non-Latin characters like Chinese, Japanese, Hebrew, Cyrillic, etc using Latin1 encoding, then they will end up as mojibake.

As you see, it is not my case. Could anyone please give me an explanation why I can store Arabic characters with latin1 encoding? Is it necessary for us to switch the encoding of our DB from latin1 to uft8?


EDIT: Okay, I just found the encoding of items is uft8...

MySQL [chuangwai]> SELECT TABLE_COLLATION
    -> FROM INFORMATION_SCHEMA.TABLES
    -> WHERE TABLE_NAME = 'items';
+-----------------+
| TABLE_COLLATION |
+-----------------+
| utf8_unicode_ci |
+-----------------+
Sayakiss
  • 6,878
  • 8
  • 61
  • 107

2 Answers2

1

Most likely explanation is that your table is UTF8, even if your schema is ASCII. Try

SELECT TABLE_COLLATION
FROM INFORMATION_SCHEMA.TABLES 
WHERE TABLE_NAME = 'items';

In my case, a UTF8 table gives me: utf8_general_ci. You might see utf8mb4_general_ci instead (that's actually better than utf8_general_ci for a variety of reasons)

Now, as to your question "is it necessary to switch encodings?" The answer is "technically, no, but it would probably be a good idea." So long as you include encoding in your table definitions, then you won't need to worry about the schema encoding. Still, it would be better to switch encoding so that you don't need to worry about accidentally munging data later.

cwallenpoole
  • 79,954
  • 26
  • 128
  • 166
1

Please provide SHOW CREATE TABLE. It may be that the table's default is one thing, but the columns are another.

You need to announce to MySQL that the bytes you have in the client are utf8. (They cannot be latin1, much less ascii, since those charsets do not have the characters in question.)

You need the column to be declared CHARACTER SET utf8 (or utf8mb4). Then all will be well.

But you managed to get somewhere with latin1? Well, that is an accident.

Case 1: You lie about what is in the client and what to store in the table columns. But latin1 is forgiving; it essentially stores bytes without regard for what they mean.

Case 2: You get "double encoding", and the characters end up storing as 4 bytes. But they magically come back out looking OK.

Case 3: Mojibake is another way to do things wrong. But since the text is retrieved intact, I don't think you have this case.

Case... (There are other cases; see the link below.)

In any case, ORDER BY and WHERE are likely to sort or filter things incorrectly.

See "Best Practice" in http://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored

Rick James
  • 135,179
  • 13
  • 127
  • 222