1

I am working on a website with MySQL database on a Linux server.

Using phpMyAdmin, on the database, it says

  • MyISAM is the default storage engine on this MySQL server
  • latin1_swedish_ci

However, I have created all the tables with InnoDB and utf8_unicode_ci. I also checked that the table fields for all tables is utf8_unicode_ci.

Yet, when I mysql_fetch_array, and echo to stream, it gives gibberish. I had to explicitly set mysql_set_charset('utf8') for the text to appear correctly.

PHP version is 5.3.9; MySQL version is 5.1.70-cll - MySQL Community Server (GPL).

This is the first time I encountered this problem and I never had to set charset before.

What caused the text fetched by php mysql_* to be gibberish? Under what circumstance is it necessary to mysql_set_charset?

EDIT: This is not a question to attract suggestion to use alternative library e.g. mysqli, pdo. I just want to understand about this current situtation about the behavior of MySQL and charsets. Thanks.

Jake
  • 11,273
  • 21
  • 90
  • 147

3 Answers3

3

When exchanging data between two systems, there's always the question "what encoding will text be sent in?" "Text" is represented simply as binary data, just long strings of 1s and 0s. These could mean anything at all. There are hundreds of encoding schemes to encode different characters into different sequences of 1 and 0. If a system just receives a string of those without being told what encoding they represent, the system cannot know what characters those supposedly are.

Therefore, for any interface between two system, there needs to be a specification for what encoding strings are in. For MySQL, that's the API call mysql_set_charset. This is the way to tell MySQL what encoding strings will be in that PHP sends to it, and what encoding MySQL should returns strings in back to PHP. Without setting this explicitly some default encoding is assumed, which may not be the same encoding you're expecting, creating a mismatch and garbage characters.

Read What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text and Handling Unicode Front To Back In A Web App for more information.

deceze
  • 510,633
  • 85
  • 743
  • 889
2

It's wise to always call it once connection is established, to ensure your app will not be affected by broken server settings. Because you can have your tables in i.e. UTF8 and send your data in UTF8 but if the connection is not UTF8 (because of i.e my.ini settings) then you end up with mess. So either call mysql_set_charset() or execute SET NAMES charset query, and you will be on safe ground. And since it is done once per connection, it's basically no cost operation anyway

Marcin Orlowski
  • 72,056
  • 11
  • 123
  • 141
  • ".. if the connection is not UTF8". Is this connection you talking about belonging to MySQL, or PHP-MySQL? – Jake Aug 12 '13 at 10:18
  • There is nothing actually wrong if you have your tables in UTF8 and send your data not in UTF8, as long as current table encoding support all the sent characters. – Your Common Sense Aug 12 '13 at 10:28
  • actually it is wrong, because data gets converted during transmission and what ends wrong in database. – Marcin Orlowski Aug 12 '13 at 11:06
1

mysql_set_charset functions sets the default character set for the current connection. Even though your data is stored in unicode on the server, it still requires a compatible connection character set to transmit data accurately.

If you execute SHOW VARIABLES LIKE 'character\_set\_%' statement in mysql it will show various sharacter sets used by the server and current connection. Ideally they should all match and be utf8.

More information: MySQL Connection Character Sets.

dezlov
  • 840
  • 8
  • 20