3

I am receiving a syntax error when I try to load a mysqldump file.

My question has a couple parts:

(1) Why is mysql unable to properly read the file that mysqldump output?
(2) How can I make mysql read in the relevant data from a file?

Heres some details:

mysqldump -u username -p dbname > mydumpfile.sql goes fine (apparently)

mysql -u testuser -p testdbname < mydumpfile.sql gets through only part (about 1/3) of the file, then gives a syntax error:

ERROR 1064 (42000) at line 249: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'randomimproperlydisplayingjapanesetext',''),(508715,134707' at line 1

The text shown as the syntax error is shortly after the beginning of a new insert statement.

The (big) insert statement statement on the previous line is not being entered into the database.

The data is from a database with Japanese text, and the column has utf8_general_ci collation.

MySQL Version 5.6.23 on windows xp.

Here are the other relevant variables (I think):

mysql> show variables like '%char%';
+--------------------------+------------------------------+
| Variable_name            | Value                        |
+--------------------------+------------------------------+
| character_set_client     | sjis                         |
| character_set_connection | sjis                         |
| character_set_database   | sjis                         |
| character_set_filesystem | binary                       |
| character_set_results    | sjis                         |
| character_set_server     | sjis                         |
| character_set_system     | utf8                         |
| character_sets_dir       | C:\mysql\share\charsets\     |
+--------------------------+------------------------------+

Edit Based on the answer below, I determined that there was a SET NAMES line in the mysqldump for setting it as utf8.

Here is the SHOW CREATE TABLE trouble_table RESULTS:

CREATE TABLE `trouble_table` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `version_id` int(11) DEFAULT NULL,
  `myutf8column` varchar(100) CHARACTER SET utf8 DEFAULT NULL,
  `mysjisenumcolumn` enum('一式','*',[a few other japanese charactes]) CHARACTER SET sjis DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `version_id` (`version_id`)
) ENGINE=InnoDB AUTO_INCREMENT=946033 DEFAULT CHARSET=utf16 `

So, table character set utf16 (I forgot why), one utf8 column, and one sjis column. In the msyqldump file I can read all values though so it seems that in the dump file all are encoded the same way.

SELECT HEX(mytuf8column) seems to confirm that myutf8column has utf8 encoding (starts with the codes mentioned below, i.e. E383xx,Ewxxyy), and mysjiscolumn has hex values starting with 95 so I guess its probably sjis.

Also, after reading this SOV question, I checked and set the max_allowed_packet to be 33554432, instead of the default, but this did not change the problem.

The part of the table that does load has no noticeable problems with the inserted data, but theres too much data for me to really look through either the db data or the mysqldump file and notice any 'weird' characters that may be causing to mysql to get choked up. (The mysqldump file is over 50MB, so its not enormous by db standards but big enough to be very troublesome to read, Notepad++ and emacs seem helpless with it)

One more thing, Im nervous about changing the column collation because I dont want to lose any data (if the current encoding is wrong, is it safe to change it to another encoding?). It took a long time to parse in the original data, hence I am trying to make a backup copy. Edit Based on answer below, I am no longer nervous about changing collation because it is only a rule for comparsion, rather I am nervous about changing the character sets.

By the way it is not a big deal if mysql needs to simply skip a few problematic rows.

Community
  • 1
  • 1
user4652310
  • 31
  • 1
  • 5
  • "it seems that in the dump file all are encoded the same way" -- You mean that some looks properly encoded in sjis, some in utf8? – Rick James Apr 21 '15 at 01:46
  • Going back to the error, can you find the characters just before "'randomimproperlydisplayingjapanesetext',''),(508715,134707'" ? That is where the problem is. Or, perhaps the escaping in that text is fouled up (sjis) probably has "'" as one byte of a valid character. This might indicate a bug in mysqldump when dumping sjis. – Rick James Apr 21 '15 at 01:50
  • @RickJames, (1) regarding your encoding question, I mean in that basically all characters within the mysqldump file are legible, and therefore encoded the same way *within* the mysqldump file (sorry maybe thats obvious). (2)The text before the error is `INSERT INTO 'troubletable' VALUES (x,x,x,x,x),(508715,134707' `, but I think the problem is the line *before* the chracters shown in the error statement, i.e. somewhere within the INSERT statement for 15000 rows. Those records are not being inserted into the DB. Right now I am deleting 1000 records at a time to find the trouble chars. – user4652310 Apr 21 '15 at 04:45
  • When the error says `the right syntax to use near 'xyz...'`, almost always means either that `xyz` is bad or that whatever was _immediately_ before `xyz` is bad. If possible, get the hex of the bad characters (if not ascii). – Rick James Apr 21 '15 at 05:48
  • Update: managed to load the whole file. I ran mysqldump with --default-character-set=sjis and the whole file loaded in without a problem after that. The default-character-set for the server and the client were both set to sjis in the configuration file, and setting mysqldump to be similar solved the issue. Strange because I have been using the same procedure with multiple databases with Japanese text, and only this one seems to be causing problems...? And thats only after loading half a million records, otherwise the issue probably wouldnt have been noticed – user4652310 Apr 21 '15 at 05:50

2 Answers2

2

In my case it was caused by a version difference between the exporting and importing mysql versions. My exporting mysql was 5.7.x (Ubuntu 16.04), but the importing was 5.5.x (Ubuntu 14.04). After upgrading the importing to 5.7.x by following this guide, it worked.

CoderGuy123
  • 6,219
  • 5
  • 59
  • 89
1

sjis and utf8_general_ci are not related. While it is possible to use sjis in the client and utf8 in the tables, it seems like an unnecessary mixture.

sjis and utf8 are "CHARACTER SETs".
sjis_japanese_ci and utf8_general_ci are corresponding "COLLATIONs".
The problem at hand concerns CHARACTER SETs.

Check the bytes (or the source) of the Japanese characters you are trying to insert -- verify whether they are 2-byte sjis encodings or 3-byte utf8 encodings.

The HEX for Japanese in utf8:

  • E381yy -- Hiragana
  • E383yy -- Katakana
  • Ewxxyy -- Kanji

The HEX for sjis is practically any combination, so hard to "recognize".

Similarly check the data in the table(s) with SELECT col, HEX(col) .... Also do (and provide for us) SHOW CREATE TABLE for one of the tables.

Back to the problem...

When using mysqldump, did you have --set-charset (and not --skip-set-charset)? If so, there should be a SET NAMES in the dump file. Check for it. It should be near the top. If it is there, we need to dig further to figure out what is going wrong.

If it is not there, you can compensate for it absence. On the mysql statement use --default-character-set=xx, where xx is either sjis or utf8, depending which encoding is in the dump.

If these clues do not suffice, please edit your Question with answers to the questions I pose.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Cheers for the clues, Im in much better position for troubleshooting, still no luck though. I edited my question to include the further details about `SET NAMES`, and the table and column character set details. – user4652310 Apr 21 '15 at 01:24
  • This answer is basically right, but just to emphasize: the character-set for mysqldump needs to be matched to the character-set for mysql input. If the default-character-set is set in the mysql config file under the [mysql] option, I recommend setting the same option under the [mysqldump] section to avoid issues like this. Still not sure why half a million records loaded with no problem though.. – user4652310 Apr 21 '15 at 06:37