0

Recently I noticed that in my database some unicode characters (mostly Japanese) turned into unreadable garbage. For example this:

「オリジナル」 桜庭わかな - Far wandering~7つめのアレクトラ

turned into

「オリジナル〠桜åºã‚ã‹ãª - Far wandering~7ã¤ã‚ã®ã‚¢ãƒ¬ã‚¯ãƒˆãƒ©

The data is obtained from YouTube API (using PHP) then placed into my database. I suppose that Google may have broke something for a short time then fixed it. However now I am unable to obtain much data using Youtube API due to closed channels and/or removed videos.

I already tried:

  • converting columns into latin1/blob/binary => utf8
  • dumping few rows into file then opening sql file in Notepad++ and changing character encoding (In hope that charset was set wrong)
  • asking for the oldest backup restore - it was already garbaged

Is there a way to restore it to the previous state?

I was told that mysql on my server is using xtradb instead of InnoDB. Below are information about my database and table

mysql> show variables like 'char%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

mysql> show variables like 'collation%';
+----------------------+-----------------+
| Variable_name        | Value           |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database   | utf8_general_ci |
| collation_server     | utf8_general_ci |
+----------------------+-----------------+
3 rows in set (0.00 sec)

mysql> SHOW FULL COLUMNS FROM my_table;
+--------------+---------------+-----------------+------+-----+-------------------+----------------+---------------------------------+---------+
| Field        | Type          | Collation       | Null | Key | Default           | Extra          | Privileges                      | Comment |
+--------------+---------------+-----------------+------+-----+-------------------+----------------+---------------------------------+---------+
| id           | int(11)       | NULL            | NO   | PRI | NULL              | auto_increment | select,insert,update,references |         |
| timestamp    | timestamp     | NULL            | NO   |     | CURRENT_TIMESTAMP |                | select,insert,update,references |         |
| etag         | text          | utf8_general_ci | YES  |     | NULL              |                | select,insert,update,references |         |
| video_id     | text          | utf8_general_ci | YES  |     | NULL              |                | select,insert,update,references |         |
| published    | text          | utf8_general_ci | YES  |     | NULL              |                | select,insert,update,references |         |
| title        | text          | utf8_general_ci | YES  |     | NULL              |                | select,insert,update,references |         |
| description  | text          | utf8_general_ci | YES  |     | NULL              |                | select,insert,update,references |         |
| thumb        | text          | utf8_general_ci | YES  |     | NULL              |                | select,insert,update,references |         |
| channel_id   | text          | utf8_general_ci | YES  |     | NULL              |                | select,insert,update,references |         |
| channel_name | text          | utf8_general_ci | YES  |     | NULL              |                | select,insert,update,references |         |
| category     | decimal(10,0) | NULL            | YES  |     | NULL              |                | select,insert,update,references |         |
| duration     | text          | utf8_general_ci | YES  |     | NULL              |                | select,insert,update,references |         |
| definition   | text          | utf8_general_ci | YES  |     | NULL              |                | select,insert,update,references |         |
+--------------+---------------+-----------------+------+-----+-------------------+----------------+---------------------------------+---------+
13 rows in set (0.00 sec)
moh
  • 1
  • Side note, you could probably use `varchar` on a lot of those columns, for example `video_id`, `published`, `title`, `thumb`, `channel_id`, etc etc. The `text` type meant for large blocks of text. – Mike Barwick Dec 12 '13 at 19:06
  • Have you tried setting your document meta to: ``? – Mike Barwick Dec 12 '13 at 19:08
  • Sidenote after closure: No, in my experience, there's no way to go back to the UTF-8 characters. The representation was already corrupted (probably in your DB tunnel transit) and they no longer contain the same character data. – rockerest Dec 12 '13 at 21:20
  • @rockerest Well then there is nothing more I can do. Thanks. – moh Dec 15 '13 at 12:02

0 Answers0