I have to import some data containing national characters not coded as Unicode into stand alone MySQL Version: 5.0.18 running on Windows7 64bit. After some initial problems I finally got it working in MySQL console.
But as the data is more than 50 MByte typing into console and or using clipboard is not possible. So I created script file only to find out that the national characters are gibberish after import.
The problem is that if I use source
command with any file the encoding broke. If I open the same file and copy the stuff to console by clipboard all works as should. Here smallest MCVE to test this:
DROP DATABASE IF EXISTS dbs;
CREATE DATABASE dbs;
USE dbs;
SET NAMES latin2;
DROP TABLE IF EXISTS `tab`;
CREATE TABLE `tab` (`ix` INT default 0,`nam` VARCHAR(50) default '' );
INSERT INTO `tab` VALUES
(1,'aacdeillnoorrstuuyzAACDEILLNOORRSTUUYZ'),
(2,'áäčďéíĺľňóôŕřšťú ýžÁ ČĎÉÍĹĽŇÓ ŔŘŠŤÚ ÝŽ');
SELECT * FROM `tab`;
When I copy this to MySQL console by clipboard the output is like this:
+------+----------------------------------------+
| ix | nam |
+------+----------------------------------------+
| 1 | aacdeillnoorrstuuyzAACDEILLNOORRSTUUYZ |
| 2 | áäčďéíĺľňóôŕřšťú ýžÁ ČĎÉÍĹĽŇÓ ŔŘŠŤÚ ÝŽ |
+------+----------------------------------------+
2 rows in set (0.00 sec)
Which is desired. But when I put all this into test.sql
file and run
source test.sql;
I got this output:
+------+----------------------------------------+
| ix | nam |
+------+----------------------------------------+
| 1 | aacdeillnoorrstuuyzAACDEILLNOORRSTUUYZ |
| 2 | ßńŔ´ÚÝňż˛ˇ˘Ó°ÜŁ˙ ř×┴ ╚¤╔═┼╝ĎË └ěŐŹ┌ ŢÄ |
+------+----------------------------------------+
Which is obviously wrong (looks like some default MS-DOS charset). I think the problem is not on side of table nor database as this is the same for pure text outs like:
SET NAMES latin2;
SELECT 'áäčďéíĺľňóôŕřšťú ýžÁ ČĎÉÍĹĽŇÓ ŔŘŠŤÚ ÝŽ' AS 'aacdeillnoorrstuuyzAACDEILLNOORRSTUUYZ';
Which outputs with clipboard:
+----------------------------------------+
| aacdeillnoorrstuuyzAACDEILLNOORRSTUUYZ |
+----------------------------------------+
| áäčďéíĺľňóôŕřšťú ýžÁ ČĎÉÍĹĽŇÓ ŔŘŠŤÚ ÝŽ |
+----------------------------------------+
And with source
file:
+----------------------------------------+
| aacdeillnoorrstuuyzAACDEILLNOORRSTUUYZ |
+----------------------------------------+
| ßńŔ´ÚÝňż˛ˇ˘Ó°ÜŁ˙ ř×┴ ╚¤╔═┼╝ĎË └ěŐŹ┌ ŢÄ |
+----------------------------------------+
It is like when importing from files the encoding got screwed. Or while inputing to MySQL console through Keyboard or Clipboard the coding got changed.
So what is going on and how to correct this (without the loss of data)?
- using
<
command line option instead ofsource
does not help - using
-e
command line option forsource
does not help - using default charset command line option does not help
- using UTF8 for non Unicode string lead to Data too long errors and loss of data
- data in clipboard is the same as in the file
[Edit1]
Well I tried newer version of MySQL 5.7.19 which took mi quite a while to get started as they change the initialization and things (wtf? that insanity got 1.8 GByte without any data !). It behaves the same no matter what I do. So I tried to use UTF8 encoding:
DROP DATABASE IF EXISTS dbs;
CREATE DATABASE dbs CHARACTER SET utf8 COLLATE 'utf8_unicode_ci';
USE dbs;
SET NAMES utf8;
DROP TABLE IF EXISTS `tab`;
CREATE TABLE `tab` (`ix` INT default 0,`nam` VARCHAR(50) default '' ) CHARACTER SET utf8 COLLATE 'utf8_unicode_ci';
INSERT INTO `tab` VALUES
(1,'áäčďéíĺľňóôŕřšťú ýžÁ ČĎÉÍĹĽŇÓ ŔŘŠŤÚ ÝŽ'),
(2,'aacdeillnoorrstuuyzAACDEILLNOORRSTUUYZ');
SELECT * FROM `tab`;
#SELECT 'áäčďéíĺľňóôŕřšťú ýžÁ ČĎÉÍĹĽŇÓ ŔŘŠŤÚ ÝŽ' AS 'aacdeillnoorrstuuyzAACDEILLNOORRSTUUYZ';
#SHOW COLLATION;
#SHOW CHARACTER SET;
SHOW VARIABLES LIKE 'char%';
And Yes the script file is converted to UTF8. Now here is my.ini setting:
[mysql]
default-character-set=utf8
[mysqld]
skip-character-set-client-handshake
character-set-server=utf8
collation-server=utf8_unicode_ci
This finally worked for the file using source
here the result:
+------+--------------------------------------------------------------------------+
| ix | nam |
+------+--------------------------------------------------------------------------+
| 1 | áäčďéíĺľňóôŕřšťú ýžÁ ČĎÉÍĹĽŇÓ ŔŘŠŤÚ ÝŽ |
| 2 | aacdeillnoorrstuuyzAACDEILLNOORRSTUUYZ |
+------+--------------------------------------------------------------------------+
+--------------------------+--------------------------------------------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
+--------------------------+--------------------------------------------------------------------+