0

I dumped a lot of string data from a SQL Server database into a text file as INSERTS on my Windows10 workstation using the SQLCMD via the cygwin command line.

It loads into mysql fine, but in mariadb I get this error:

ERROR 1366 (22007) at line 384: 
    Incorrect string value: '\x87ao Su...' for column 'description' at row 1995

My SQL Server DB has these settings:

SELECT DISTINCT C.collation_name 
FROM   sys.tables AS T 
       INNER JOIN sys.columns C 
               ON T.object_id = C.object_id 
WHERE  collation_name IS NOT NULL; 

collation_name: Latin1_General_CI_AS

SELECT DISTINCT C.collation_name 
FROM   sys.tables AS T 
       INNER JOIN sys.columns C 
               ON T.object_id = C.object_id 
WHERE  collation_name IS NOT NULL; 
collation_name: Latin1_General_CI_AS

and this is MySQL (where it inserts happily):

SHOW FULL COLUMNS FROM forecast
--------------

+--------------------+------------------+-----------------+------+-----+-------------------+----------------+---------------------------------+---------+
| Field              | Type             | Collation       | Null | Key | Default           | Extra          | Privileges                      | Comment |
+--------------------+------------------+-----------------+------+-----+-------------------+----------------+---------------------------------+---------+
| description        | varchar(255)     | utf8_unicode_ci | YES  |     | NULL              |                | select,insert,update,references |         |

SELECT @@character_set_database, @@collation_database, @@collation_connection
--------------

+--------------------------+----------------------+------------------------+
| @@character_set_database | @@collation_database | @@collation_connection |
+--------------------------+----------------------+------------------------+
| utf8                     | utf8_unicode_ci      | utf8_general_ci        |
+--------------------------+----------------------+------------------------+
1 row in set (0.01 sec)

and this is from MariaDB where it gives that error above:

SHOW FULL COLUMNS FROM forecast
--------------

+--------------------+------------------+-----------------+------+-----+---------------------+----------------+---------------------------------+---------+
| Field              | Type             | Collation       | Null | Key | Default             | Extra          | Privileges                      | Comment |
+--------------------+------------------+-----------------+------+-----+---------------------+----------------+---------------------------------+---------+
| description        | varchar(255)     | utf8_unicode_ci | YES  |     | NULL                |                | select,insert,update,references |         |
+--------------------+------------------+-----------------+------+-----+---------------------+----------------+---------------------------------+---------+
10 rows in set (0.01 sec)

MariaDB [tardis]> SELECT @@character_set_database, @@collation_database, @@collation_connection;
--------------
SELECT @@character_set_database, @@collation_database, @@collation_connection
--------------

+--------------------------+----------------------+------------------------+
| @@character_set_database | @@collation_database | @@collation_connection |
+--------------------------+----------------------+------------------------+
| utf8                     | utf8_unicode_ci      | utf8_general_ci        |
+--------------------------+----------------------+------------------------+
1 row in set (0.01 sec)

As far as I can tell, MariaDB should behave the same as MySQL but it doesn't.

What do I have to do?

I could convert the text with a SQL Server function - at the moment I am just escaping \' characters - but how?

[UPDATE] Just discovered that MariaDB doesn't have any problems with the data when I run the import SQL commands from file.

It's only when I pipe it directly on the command line from SQLCMD to mysql.exe that I get the error:

$ SQLCMD.EXE -i "c:\dev\workspace\sql\generate-inserts-for-migration.sql" \
       -h-1 -S "myhost" -d MyDBName -r1 -W \
       -b -f o:65001 | mysql -h myhost.amazonaws.com -u adam -p 

I wanted to do it like that so that I don't have to create a 10GB SQL file on my hard drive, or I could even run it in a docker container (with limited GB space) on the same network on the cloud.

Adam
  • 5,215
  • 5
  • 51
  • 90

2 Answers2

0

You need to tell sqlcmd to convert your output text to unicode as you create your SQL file dump.

Try using the -u flag when you dump your file.

Or try using this flag when you import it, to translate from latin-1 to utf-8

 -f i:1252,o:65001

1252 is microsoft-speak for Latin-1 and 68001 for utf-8.

And, by the way, for futureproofing your MySQL / MariaDB data, you should consider using utf8mb4 as your character set rather than just utf8. Here's an explanation. What is the difference between utf8mb4 and utf8 charsets in mysql?

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • `sqlcmd` is not playing ball. Neither `-u` nor `-f o:65001` changed the output into something mariadb could handle. I searched through the dumped file and couldn't find `\x87ao` or anything like it. Is it possible that it worked for mysql because I saved the dump to file with `> dump.sql` but this time for mariadb I'm just doing `SQLCMD | mysql` on the command line? I figured not writing the file would save hours. – Adam May 17 '18 at 13:47
  • `\x87` is potentially the character `‡` in Latin1. – Michael - sqlbot May 18 '18 at 01:23
  • in Latin1, C-cedilla is \xC7, according to https://en.wikipedia.org/wiki/Windows-1252 . You data is probably in cp850, which is default OEM charset for Western Europe. C-cedill is \x87 there https://en.wikipedia.org/wiki/Code_page_850 – Vladislav Vaintroub May 23 '18 at 23:19
0

The special character turned out to be C with a cedilla as in Curaçao

which looked like this in the dumped file

 Cura<87>ao 

after exporting it from SQL Server with SQLCMD -f o:65001 where 65001 is the Microsoft UTF8 code page.

I also tried SQLCMD -f o:1252 since the database says it's latin1.

I discovered what the special character was thanks to this excellent answer How can I find Unicode/non-ASCII characters in an NTEXT field in a SQL Server 2005 table?

and I assume that mySQL (as opposed to MariaDB) was simply importing it badly without complaint.

The MariaDB docs told me to set the --default-character-set on the command line appropriately, so I tried latin1 and utf8, neither of which worked.

I then looked at the long list of character sets for MariaDB in /usr/share/mysql/charsets/Index.xml and it turns out the cp1250 works without complaint but it imports the special character as

The command was mysql --default-character-set=cp1250.

I haven't been able to import the data correctly, but at least I've got it to import in MariaDB now.

Adam
  • 5,215
  • 5
  • 51
  • 90
  • Or use `SET NAMES latin1` after connecting. This declares that the client encoding is latin1. (This can differ from the table declarations; conversion is automatic.) – Rick James May 26 '18 at 00:16