1

I have two character set bugs that are preventing me from seeding my database with an old database's contents.

1. Error Code 1366: Incorrect string value "\xDB"

I've tried changing the CHARACTER SET of my TABLE to utf8mb4.

I've looked into what \xDB would be as a character type, and stumbled upon this. This lead me to believe it's a Javascript escape format. However, I can't seem to find a MySQL CHARACTER SET that supports this.

Even if I do find a CHARACTER SET that supports Javascript escape format, wouldn't it be best to convert everything to the more standard and all encompassing utf8mb4? How would this be done?

2. Error Code 1300: Invalid utf8mb4 character string

This is the offending character: ®

Again, I've tried similar things as mentioned above, with little success.

EDIT:

I set the CHARACTER SET in a new database as follows:

CREATE TABLE `products` (
    `product_id` INT NOT NULL AUTO_INCREMENT UNIQUE,
    `product_line` VARCHAR(255) NOT NULL,
    `product_num` VARCHAR(255) NOT NULL,
    `description` LONGTEXT NOT NULL,
    PRIMARY KEY (`product_id`)
) CHARACTER SET utf8mb4 ;

EDIT 2:

I didn't know this made a difference, but the problem was arising while doing a LOAD DATA INFILE as follows:

USE `inventory`;

LOAD DATA INFILE
'all_products.csv'
INTO TABLE `products`
    FIELDS
        TERMINATED BY ','
        OPTIONALLY ENCLOSED BY '"'
    LINES
        TERMINATED BY ',\r\n'
IGNORE 1 ROWS
(`product_line`,`product_num`,`description`);
Neil
  • 2,004
  • 3
  • 23
  • 48
  • Are you aware that changing the table's default character set does not change the character set of the existing columns? You need `ALTER TABLE ... CONVERT TO CHARACTER SET ...` for that. Refer to https://dev.mysql.com/doc/refman/8.0/en/alter-table.html under heading "Changing the Character Set" – Bill Karwin Aug 31 '18 at 21:04

2 Answers2

2

Do not blindly do ALTER ... CONVERT TO ...; it could make things worse. If you have already made a "correction" to the already incorrect situation, then any advice I give below is not likely to undo the two problems you have.

® and DB do not map to one-another.

DB, in various character sets represents:

                                    macce   2  1 'Ř'
                           cp1250, latin2   2  1 'Ű'
                           cp1257, latin7   2  1 'Ū'
                     dec8, latin1, latin5   2  1 'Û'
                                      hp8   2  1 'Ü'

Meanwhile, AE:

cp1250, cp1251, cp1256, cp1257, geostd8,
           hebrew, latin1, latin5, latin7   2  1 '®'
                                    macce   2  1 'ģ'
                                      hp8   2  1 'Û'
                                   latin2   2  1 'Ž'
                    cp850, cp852, keybcs2   2  1 '«'

The usual error involves specifying latin1 in one place, but utf8/utf8mb4 in another.

(For the sake of this discussion, utf8 and utf8mb4 act the same.)

This discusses typical problems. I am worried that either there has been an incorrect correction that muddied the situation or the single character is not enough for diagnosing. See this for the available fixes and which cases they apply to.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • This lead me a solution, for this I award you the bounty. Is there a resource you @Rick used to find what character codes map to the other? I noticed that both codes I am using map to `latin1`. Thus, for those wondering [this](https://stackoverflow.com/questions/41922029/load-data-infile-invalid-ut8mb4-character-string) question lead me to realize I need to set the `character set` in the `LOAD DATA INFILE` – Neil Sep 04 '18 at 20:47
  • @Neil - I have collected the latin1:utf8 encodings here: http://mysql.rjweb.org/doc.php/charcoll#8_bit_encodings The list of mappings for DB and AE came from a special script that went through the character sets doing `CONVERT(CONVERT(UNHEX('AE') USING ...) USING utf8mb4)` (plus some pretty-printing). – Rick James Sep 05 '18 at 01:44
0

Have you tried changing the collation to utf8_unicode_ci? Also, there are two important settings in your my.cnf file. They are:

collation-server
character-set-server

Also, I totally agree with @Rick you should avoid ALTER ... CONVERT TO .... Try to execute before inserting:

SET NAMES utf8_unicode_ci

Does this help?

entpnerd
  • 10,049
  • 8
  • 47
  • 68
Igor Skobelev
  • 339
  • 1
  • 6