4

I have a large text file containing Arabic text data. When I try to load it into a MySQL table, I get error saying Error code 1300: invalid utf8 character string. This is what I have tried so far:

LOAD DATA INFILE '/var/lib/mysql-files/text_file.txt'
IGNORE INTO TABLE tblTest
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n';

I tried to ignore this error, but it does not work. I have tried LOCAL INFILE but it did not work, too. My database was created using DEFAULT CHAR SET UTF8 and DEFAULT COLLATE utf8_general_ci. The text file is utf-8 encoded.

I do not want the records which contain invalid utf8 characters. So how I can load the data with ignoring the records containing such invalid chars?

Thank in advance!

Mohammed
  • 1,364
  • 5
  • 16
  • 32

2 Answers2

3

It would help to have the HEX of the naughty character.

A possible approach to reading all the text, then dealing with any bad characters:

  1. Read into a column of type VARBINARY or BLOB.

  2. Loop through the rows, trying to copy to a VARCHAR or TEXT column.

Another plan is to use utf8mb4 instead of utf8. It could be that the bad character is an Emoji or Chinese character that will work in utf8mb4, but not utf8.

Ignore errors

This may let you ignore errors:

SET @save := @@sql_mode;
LOAD DATA ...;
SET @@sql_mode := @save;
Rick James
  • 135,179
  • 13
  • 127
  • 222
  • In Python, I catch any exception and decide the action to do when such exception occurs, e.g. `except ValueError: continue` and so on. But I am not sure if this technique is available in MySQL. – Mohammed Nov 20 '16 at 21:40
  • `Update:` I had to check all the characters that are available in the text file using Python `chars = {c for c in textfile}`. I found a few more strange chars which have been cleaned using `regex`. After that, the `MySQL load statement` worked fine with `utf-8` and `text` as types of text columns. Thanks are extended to `Rick James` for your kind help. I am marking the answer as accepted. – Mohammed Nov 21 '16 at 15:50
  • 1
    @RickJames. You have an enormous about of respect from me but your answer is unhelpful. The question isn't about bad characters. It's about ignoring bad characters on import. – Altimus Prime Jul 31 '19 at 23:29
  • @AuntJemima - Point made. I added to my answer. In reviewing `sql_mode`, I wonder if he is hitting a bad date or something unrelated to Arabic. – Rick James Aug 01 '19 at 00:39
  • @RickJames. Thank you. It doesn't work, but thanks. MySQL 5.7 doesn't have an sql_mode that skips bad characters and the special term `IGNORE` still throws an error instead of a warning, which doesn't do any good either. https://dev.mysql.com/doc/refman/5.7/en/sql-mode.html. Is this the same in future releases of MySQL? – Altimus Prime Aug 01 '19 at 01:24
  • @AuntJemima - I am _guessing_ that `strict_mode` (or maybe something else) would be the difference between "ignore" and "error". `sql_mode` tends to be augmented in new versions, especially 8.0. And the default value tends to change -- leading to incompatibilities. Beware. – Rick James Aug 01 '19 at 01:50
3

I have this problem when try use MySQL 5.7.14, too.

I returned to MySQL 5.6 and this problem is disappeared