0

I am importing data from outside into my database MYSQL using PHP Scripts. Encoding my database charset to utf8 from the query

ALTER DATABASE DEFAULT CHARSET 'utf8';

then i executed a query to see all charsets by

SHOW VARIABLES LIKE 'character_set%';

output is:

+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

as we can see character_set_database is set to utf8 but still if i write a code from my PHP Script to see the encoding as

echo $charset = mysql_client_encoding($cn);

the output is latin1 . From the above query latin1 is for server only. Can anyone tell me what exactly i am missing as i am unable to encode my Chinese and Japaneses characters to database.

EDIT

I am importing a database from outside which have unicode characters as 我的上网主页 and 嶏紞鎴戠殑 in Chinese , Japaneses and other different languages. But when i import data to my database tables i get ????? instead of above characters. How can i encode these characters? Is it utf-8 or 16 and how i can recognize that which encoding will support these characters?

Astha
  • 1,728
  • 5
  • 17
  • 36
  • You should better explain your real problem, because `mysql_client_encoding` returns only the charset that has been set on the moment of the connection. – zerkms Feb 22 '12 at 04:41
  • 1
    mysql_set_charset() right after you connect; also, setting the default for the database won't change existing tables. – miki Feb 22 '12 at 04:42
  • how can i permanently set charset to utf8 ? – Astha Feb 22 '12 at 04:52
  • @Astha: are you sure the data doesn't break on the reading step? – zerkms Feb 22 '12 at 04:52
  • @zerkms "doesn't break" means ? – Astha Feb 22 '12 at 04:54
  • @Astha: I mean - are you sure you read the data correctly? And are you sure the data isn't broken? – zerkms Feb 22 '12 at 04:57
  • @zerkms yes. Because there are thousands of rows which are coming normally. Only the data which are in different lang. is stored in the format of ???? in db instead of real characters. Else every row of the db table is fine. – Astha Feb 22 '12 at 04:59
  • what the heck you mean with "I am importing"? What are you doing **certainly**? throwing your data with shovel or what? – Your Common Sense Feb 22 '12 at 05:06
  • @Astha you probably stored JP/CN lang with the wrong encoding (latin1), now you are trying to convert it to UTF-8? – Matthew Scragg Feb 22 '12 at 05:16
  • @MatthewScragg yes you are right. In my case every friday i get a new set of data which i replace with the older one in few tables of my database. So what i want the next time when i import complete data again ??? should be encoded as they really are in JP/CN lang. – Astha Feb 22 '12 at 05:33
  • 1
    @Astha First thing you need to make sure is that your fields are set to UTF-8. Setting a database or table to UTF-8 will only affect new tables and fields respectively. You cannot just do an alter table and the existing data will be magically converted. Existing data will likely be corrupted. This is very important to realize. The safest thing to do is make a copy of the table structure and make sure the appropriate fields are UTF-8. Make sure you query mysql "SET NAMES 'utf8'" before you do the insert. On your website, you will have to make sure your encoding is set to UTF-8. – Matthew Scragg Feb 22 '12 at 05:41

2 Answers2

3

character_set_database just refers to the default character set of any created tables in that DB, I think. Therefore, having it set to UTF8 won't help, I suggest the following:

Every time I initialize my database connection, I execute $db->query("SET NAMES 'utf8'");

Talks about SET NAMES https://stackoverflow.com/a/1650834/1221902

More on set names for the critics

It will depend on your MySQL/PHP version for the availability of the appropriate function that would be a better alternative to the "SET NAMES 'utf8'" query.

A SET NAMES 'x' statement is equivalent to these three statements:

SET character_set_client = x; SET character_set_results = x; SET character_set_connection = x;

From http://dev.mysql.com/doc/refman/5.1/en/charset-connection.html MySQL 5.1 (alot of people still use 5.1)

The character_set_results system variable indicates the character set in which the server returns query results to the client. This includes result data such as column values, and result metadata such as column names.

Community
  • 1
  • 1
Matthew Scragg
  • 4,540
  • 3
  • 19
  • 27
-1

You are missing client encoding.
While above variables are just server-side ones, you have to set up client encoding using

mysql_set_charset()

As you stated that you are using obsolete PHP version, the only option you have (beside upgrading PHP/switching drivers) is SET NAMES <actual data encoding> set names query.

As it turned out, your problem not in the setting connection encoding but with some mysterious "importing". As you provide no details, I can only guess.
If you are importing some mysql dump - check table definitions. it is very likely that charset may be wrong there. You can simple change it with search&replace.

Your Common Sense
  • 156,878
  • 40
  • 214
  • 345
  • As long as it is `character_set_client | utf8` on the server - doubtfully it would change anytihng – zerkms Feb 22 '12 at 04:44
  • i am using php 5.1 and it does not support this function. – Astha Feb 22 '12 at 04:48
  • @Astha: you need to upgrade to at least the latest 5.2.x. The latest 5.1 has been released in Aug 2006, about 5.5 years ago. – zerkms Feb 22 '12 at 04:51
  • 1
    @Astha well, the only your choice then is `SET NAMES ` query. It won't change client's encoding but will tell the server which encoding incoming data in. Beside upgrading of course – Your Common Sense Feb 22 '12 at 04:59