I have a database table with a column where I categorized Persian alphabetic letters to select with MySQL WHERE later. everything works fine for all letters, but I have a problem while selecting letter (چ) which is stored as (Ù†) in database and (ن) which is stored as (Ú†).
first I thought the problem could be from inserting same letters, but when I checked in database , letters where stored with different encoding I mean (Ù†) and (Ú†).
when I zoom in these letters the tick over U is different. both letters are echoed correctly when I echo them on webpage, but when I choose to select letters WHERE letter = 'چ' it shows letters with (ن) too!!!
all of the webpages that insert and read data from DB are in UTF-8 and database collation is utf_persian-ci.
I cant find where the problem is with this? any help is appreciated,

- 172
- 2
- 11
-
@CharlotteDunois: thanks for your quick reply, but this didn't work!! – Abbas Nabilou Mar 26 '16 at 18:34
-
It shows both characters in the results, even though they are stored with different encodings, because those two characters are actually considered *equivalent* in the specified collation. – spencer7593 Mar 26 '16 at 18:43
-
@spencer7593 So how to fix this? – Abbas Nabilou Mar 26 '16 at 18:57
-
I thought the question you asked was *why* this was happening. The behavior is per the specification. It's similar to the behavior we see with a "case insensitive" collation, where the characters 'A' and 'a' are considered to be equivalent. How do we get different behavior? We use a different collation, one that is either "case sensitive" or "binary". Similarly, you could use a different collation, one that doesn't consider those two characters to be equivalent. – spencer7593 Mar 26 '16 at 19:02
-
"letters where stored with different encoding I mean (Ù†) and (Ú†)." - please elaborate on that. Where and how do you actually "see" Ù† and Ú†? What is the underlying storage charset of the tables/columns? – VolkerK Mar 26 '16 at 19:11
-
oh and: What is the connection charset? -> How do you connect to the MySQL server and is there anything that looks like it's informing mysql about the charset? (those commands usually contain the strings "charset", "names" and/or "collation") – VolkerK Mar 26 '16 at 19:16
-
@VolkerK: first the collation was utf8_general_ci , then after the answers I tested utf8_persian_ci and utf8_unicode_ci . I added new records after making changes. also webpages are saved in utf8 and have this tag : – Abbas Nabilou Mar 26 '16 at 19:21
-
Exactly how did you test/set utf8_general_ci and utf8_persian_ci? How do you connect to the MySQL server? (mysql_connect, new mysqli(), new PDO(), ...?) – VolkerK Mar 26 '16 at 19:24
-
I use phpmyadmin operation panel tool to change collation, it is new mysqli () – Abbas Nabilou Mar 26 '16 at 19:27
-
And what does `foreach( $mysqli->query("SHOW variables like 'character_set_c%'") as $r ) { echo join(',',$r), "\r\n"; }` print (right before you would normaly send your query), (the foreach loop works only with php version >= 5.4.0, since mysqli_result only implements Traversable since then) – VolkerK Mar 26 '16 at 19:34
-
your last comment is a little bit complicated ! I will test all these with a new database with right collations from the very beginning tomorrow to see what is going wrong exactly. thanks for all your support, I will tell you about the results. :) – Abbas Nabilou Mar 26 '16 at 19:49
-
Before you try anything, please have a read of http://dev.mysql.com/doc/refman/5.7/en/charset-connection.html and http://stackoverflow.com/questions/279170/utf-8-all-the-way-through first. It _might_ very well have (almost) nothing to do with the charset/collation of the column/table/database. – VolkerK Mar 26 '16 at 19:58
2 Answers
This is a typical result of using a 'locale specific unicode encoding', in your case utf8_persian_ci
. I expect that if you switch your collation to utf8_unicode_ci
, it will work as expected.
If by any change you want to get rid of the case-insensitivity, you could switch to utf8_bin
.
For further reference see the MySQL documentation.

- 23,534
- 17
- 88
- 105
-
I tested both utf8_unicode_ci, and utf8_bin. none seem to be working! – Abbas Nabilou Mar 26 '16 at 18:51
-
if using `utf8_bin` gives you `ن` for a query with `WHERE letter = 'چ'`, your connection collation may be the fault (`utf8_bin` does not know about language, only about bytes. So different unicode code points are definitely not treated equal). – Jacco Mar 26 '16 at 19:06
-
The strange thing is that neither utf8_persian nor utf8_unicode treats čīm (چ) and Nūn (ن) as equal. see http://collation-charts.org/mysql60/mysql604.utf8_persian_ci.html and http://collation-charts.org/mysql60/mysql604.utf8_unicode_ci.middle_eastern.html . But probably both would treat "Ù†" and "Ú†" as equal, so he question _might_ be: what is the the "real" chrset/collation here. – VolkerK Mar 26 '16 at 19:09
-
@VolkerK, in that case, I expect the error to be entirely somewhere else, without knowledge of the entire setup, it is hard to pinpoint where things get misencoded (if it actually *is* an encoding error). – Jacco Mar 26 '16 at 19:14
Mojibake. (or not; see below) Probably:
- The bytes you have in the client are correctly encoded in utf8 (good).
- You connected with
SET NAMES latin1
(orset_charset('latin1')
or ...), probably by default. (It should have beenutf8
.) - The column in the tables may or may not have been
CHARACTER SET utf8
, but it should have been that.
For PHP:
⚈ mysqli interface: mysqli_set_charset('utf8') function.
⚈ PDO interface: set the charset attribute of the PDO dsn or via SET NAMES utf8.
The COLLATION
(eg, utf8_persion_ci
) is not relevant to Mojibake. It is relevant to how characters are ordered.
Edit
You say "is stored as (Ù†)" -- How do you know? Most attempts to see what is stored are subject to the client fiddling with the bytes. This is a sure way to see what is there:
SELECT col, HEX(col) FROM tbl ...
For چ
, the HEX should be DA86
for proper utf8 (or utf8mb4) encoding. If you get C39AE280A0
, then you have "double encoding". In general, Arabic/Persian/Farsi should be of the form Dxyy
.
If you read چ
while connected with latin1
, you will get Ù†
, which is DA86
in latin1 encoding (Ù
= DA
and †
= 86
).
ن
encodes as D986
.
Double Encoding
I used hex(col) to send query and got C399E280A0 for ن and C39AE280A0 for چ .
So, you have "double encoding", not "Mojibake".
C399
is utf8 for Ù
; E280A0
is utf8 for †
. Your character was changed from latin1 to utf8 twice. Usually the end result is invisible to the outside world, but messed up in the table. That is because the SELECT
decodes twice. However, since you are seeing only one decode, things are not that simple.
Caveat: You have a situation where I have not experimented; the advice I give you could be wrong.
Here's what probably happened.
- The client had characters encoded as utf8 (good) hex:
D986
; - When inserting, the application lied by claiming that the client had latin1 encoding. (This is the old default.);
D9
converted toÙ
and86
converted to†
; - The column in the table declared
CHARACTER SET utf8
(good). But now theÙ
is stored asC399
and the†
is stored asE280A0
, for a total of 5 bytes; - When reading the connection claimed utf8 (good) for the client, so those 5 bytes were turned back into
Ù†
; - The client dutifully said the utf8 data was
Ù†
.
Notice the imbalance between the INSERT
and the SELECT
. You tagged this PHP
; did PHP both write and read the data? Did it have a different setting for the charset for writing and reading?
The problem seems to be only in setting the charset for writing. It needed to be explicitly utf8, not defaulting to latin1.
But what about the data? If everything I said (about double encoding) matches what you have, then an UPDATE
can fix the data. See my blog for the details.

- 135,179
- 13
- 127
- 222
-
I added charset functions, but that was of no use. I think everything in MySQL side is ok. because it is showing both letters چ , ن correctly when echoed in php pages. if it was going to treat both identical, then it should have shown one for both. So the problem may be is something missing in php page or so!! – Abbas Nabilou Mar 27 '16 at 07:54
-
See my edit; maybe I have addressed your question. And please perform that `SELECT`. – Rick James Mar 27 '16 at 16:48
-
I used hex(col) to send query and got C399E280A0 for ن and C39AE280A0 for چ . is it possible to detect and fix the problem when we know this? – Abbas Nabilou Mar 28 '16 at 08:54
-
-
Dear @Rick I tested your code : UPDATE Tbl SET col = IFNULL(CONVERT(CONVERT(CONVERT(col USING latin1) USING binary) USING utf8), col ) . on my columns and it worked like a charm, now I can see texts in perisan in DB and also I added mysqli->set_charset("utf8"); right after the connection. so the echoed text on php page shows in persian too. thank you very very much for time and effort you put on this. – Abbas Nabilou Mar 29 '16 at 06:48