4

I have a string stored in MySQL like this: یکی از Ø where my table's character set is utf8 and collation is utf8_general_ci.

When I retrieve the string from MySQL, the browser shows it like this: یکی از بهترین راه and it's ok (It is persian).

Note:

  • I used mysql_query("SET NAMES 'utf8_persian_ci'"); after connecting to MySQL.

  • I put <meta http-equiv="Content-Type" content="text/html;charset=utf-8" /> tag in head of each page.

 

Now, I need use mysql_query("SET NAMES 'utf8'");.

But after using it, the browser shows that string like this: یکی از Ø (as same as stored in MySQL).

How can I change my MySQL stored records and solve my problem? Or use some PHP code to convert the output encoding?

masoud
  • 55,379
  • 16
  • 141
  • 208
  • I didn't test it, but [utf8_encode()](http://php.net/manual/fr/function.utf8-encode.php) may help you – romainberger Jan 10 '13 at 11:22
  • 1
    @romainberger - What does ISO-8859-1 have to do with this? – Álvaro González Jan 10 '13 at 11:24
  • @romainberger: I used your suggestion to convert retrieved string from MySQL and then passed it to HTML. But It shows another weird text. – masoud Jan 10 '13 at 11:27
  • Please, check collation of DB, tables and columns involved in your query. It must be utf8_persian_ci for best results – Alexander Taver Jan 10 '13 at 11:54
  • Please note that the `mysql_query()` function is deprecated (along with other `mysql_xx()` functions). It is obsolete and insecure. You should consider switching to either the `mysqli_xxx()` funcs or the PDO library. See also http://stackoverflow.com/questions/12859942/why-shouldnt-i-use-mysql-functions-in-php – SDC Jan 10 '13 at 11:54

3 Answers3

4

Your problem is that your SET NAMES 'utf8_persian_ci' command was invalid (utf8_persion_ci is a collation, not an encoding). If you run it in a terminal you will see an error Unknown character set: 'utf8_persian_ci'. Thus your application, when it stored the data, was using the latin1 character set. MySQL interpreted your input as latin1 characters which it then stored encoded as utf-8. Likewise when the data was pulled back out, MySQL converted it from UTF-8 back to latin1 and (hopefully, most of the time) the original bytes you gave it.

In other words, all your data in the database is completely messed up, but it just so happened to work.

To fix this, you need to undo what you did. The most straightforward way is using PHP:

  1. SET NAMES latin1;
  2. Select every single text field from every table.
  3. SET NAMES utf8;
  4. Update the same rows using the same string unaltered.

Alternatively you can perform these steps inside MySQL, but it's tricky because MySQL understands the data to be in a certain character set. You need to modify your text columns to a BLOB type, then modify them back to text types with a utf8 character set. See the section at the bottom of the ALTER TABLE MySQL documentation labeled "Warning" in red.

After you do either one of these things, the bytes stored in your database columns will be the actual character set they claim to be. Then, make sure you always use mysql_set_charset('utf8') on any database access from PHP that you may do in the future! Otherwise you will mess things up again. (Note, do not use a simple mysql_query('SET NAMES utf8')! There are corner cases (such as a reset connection) where this can be reset to latin1 without your knowledge. mysql_set_charset() will set the charset whenever necessary.)

It would be best if you switched away from mysql_* functions and used PDO instead with the charset=utf8 parameter in your PDO dsn.

Francis Avila
  • 31,233
  • 6
  • 58
  • 96
  • The MySQL documentation I linked to goes into detail. Essentially, `ALTER TABLE mytable MODIFY mycolumn BLOB` then `ALTER TABLE mytable MODIFY mycolumn [original-type] CHARACTER SET utf8`. – Francis Avila Jan 10 '13 at 15:02
1

Probably what you need is to change the column Charset and Collation.

Try executing this query:

ALTER TABLE  `YOUR_TABLE_NAME` CHANGE  `YOUR_COLUMN_NAME`  `YOUR_COLUMN_NAME` VARCHAR( 255 ) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL

(In the example I assumed VARCHAR with a DEFAULT NULL VALUE)

If this works for you then you could insert in your my.conf file the code below to make utf8 the default charset:

default-character-set   = utf8
Alepac
  • 1,833
  • 13
  • 24
  • My table character set and collation is currently `utf8` and `utf8_general_ci`. Do you mean I should re-convert them? – masoud Jan 10 '13 at 11:57
  • Did you inert your data with this charset or with another one? I experienced this kind of problem in java and I had to reinsert all the data using the same charset on all the part involved in communication i.e. client, server and mysql. – Alepac Jan 11 '13 at 10:17
1

Possible solutions:

  1. Add "AddDefaultCharset UTF-8" to your site's .htaccess file.
  2. Add header('Content-Type: text/html; charset=utf-8'); to the top of your PHP file.
  3. Use utf8_encode() php function on the data that needs to be displayed in Persian.