1

Currently I do an extract with SQLCMD from an old SQL Server DB where the varchar column is latin1 and I generate INSERT statements into a file to load the data into a MariaDB database:

INSERT INTO maria_table (id, name, description) VALUES
 (210,'ALL.M.USDBBL','Refining GOA All Monthly')
,(211,'S.M.USDBBL','RefinGdd 2 BBL'
,(212,'R.M.USDBBL','RefinGdd 2BL') 
, etc

using the SQL Server command and options:

 SQLCMD.EXE -i "generate-inserts.sql" -h-1 -S "my-host" -d Tardis -r1 -W -b -f o:65001 > inserts.sql

Also tried the -u flag instead with no luck.

This is the SQL:

SELECT CONCAT('INSERT INTO maria_table (id, name, description)',
              ' VALUES ');

SELECT CONCAT(IIF(row_number() OVER (ORDER BY id ) = 1, ' ', ','),
    '(',
    id, ',',
    '''', dbo.encode4MySql(name), ''',',
    '''', dbo.encode4MySql(description), ''')'
)
FROM sql_srv_table WITH (NOLOCK)
ORDER BY id;

SELECT ';';

SELECT 'COMMIT;';

and this is the function I need to develop:

CREATE FUNCTION dbo.encode4MySql(@raw VARCHAR(4000))
      RETURNS VARCHAR(4000) AS BEGIN
  DECLARE @replaced VARCHAR(4000);

  SET @replaced = REPLACE(@raw, '''', '''''');

  ??? what do I do here to replace those special characters?

  RETURN @replaced;

END;

All I am replacing so far is the '.

But I get errors like this from MariaDB:

ERROR 1366 (22007) at line 384: 
    Incorrect string value: '\x87ao Su...' for column 'description' at row 1995

I asked a similar question here SQL Server dump of varchar data to file loads in mysql but not mariadb but essentially didn't get the whole solution, so I'm asking more focused question this time about how to encode that and further incompatible characters.

As someone on the previous question commented, "\x87 is potentially the character ‡ in Latin1".

I'm also unable to locate the special character in the dump file - searching with any combination of \x87ao or ‡ doesn't work.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
Adam
  • 5,215
  • 5
  • 51
  • 90
  • Finally managed to locate the special character thanks to this answer: https://stackoverflow.com/questions/686967/how-can-i-find-unicode-non-ascii-characters-in-an-ntext-field-in-a-sql-server-20#answer-687037 – Adam May 18 '18 at 10:01
  • 1
    In my experience MySQL handles mixed encodings nicely—I guess so does MariaDB. But you need to declare encoding correctly. If you happen to be using the [official command line](https://mariadb.com/kb/en/library/mysql-command-line-client/#mysql-commands) you need to set `--default-character-set`. Also, you SQL Server encoding will possibly not be Latin-1 but something like Windows-1252. – Álvaro González May 18 '18 at 10:55
  • I cannot find the right translation. Using `SQLCMD -f o:1252 | mysql --default-character-set=1252` causes the `ç` to morph into `‡` – Adam May 18 '18 at 10:59
  • That's not how MySQL encodings are named. And I get *mysql: Character set '1252' is not a compiled character set* (it's strange you don't). – Álvaro González May 18 '18 at 11:01
  • [https://dev.mysql.com/doc/refman/8.0/en/charset-we-sets.html](https://dev.mysql.com/doc/refman/8.0/en/charset-we-sets.html). Enjoy the part where it says: "latin1 is the default character set. MySQL's latin1 is the same as the Windows cp1252 character set." (yeah, not kidding.) – Álvaro González May 18 '18 at 11:03
  • Sorry, my mistake. It is `--default-character-set=cp1252` – Adam May 18 '18 at 11:29
  • I realise I shared a link to MySQL docs, not MariaDB—it should have been [Supported Character Sets and Collation](https://mariadb.com/kb/en/library/supported-character-sets-and-collations/). Whatever, none of them seem to have a character set called `cp1252`. I don't know why you don't get an error message but, as I said, I don't have direct experience with MariaDB (perhaps MariaDB has an *ignore errors* global settings or something). – Álvaro González May 18 '18 at 11:38
  • Oh I'm giving wrong info again. It's `cp1250` which is listed in `/usr/share/mysql/charsets/Index.xml` which is where the `mysql` looks for valid code pages. – Adam May 18 '18 at 11:53
  • Oh, I thought you were talking about something similar to Latin-1. Win-1250 is for Eastern Europe languages (you know, `Ś`, `ź` and the like). How exactly have you determined the encoding used by SQL Server? – Álvaro González May 18 '18 at 11:58
  • cp1252 is Western European - https://msdn.microsoft.com/en-us/library/windows/desktop/dd317756(v=vs.85).aspx and I haven't determined what code page SQL Server is 'using' - it just says `latin1` and the SQLCMD docs say it will automatically adjust to your console if you don't specify anything. – Adam May 18 '18 at 14:00
  • I don't have SQL Server here to test but [according Microsoft](https://learn.microsoft.com/en-us/sql/relational-databases/collations/view-collation-information?view=sql-server-2017) you can try e.g. `SELECT CONVERT (varchar, DATABASEPROPERTYEX('database_name','collation')); `. BTW, 1250 and 1252 are different code pages. – Álvaro González May 19 '18 at 17:33

1 Answers1

0

There are at least 357 utf8 characters ending with 87. Here are a few of them:

| Ç               | LATIN CAPITAL LETTER C WITH CEDILLA      |
| ć               | LATIN SMALL LETTER C WITH ACUTE          |
| Ň               | LATIN CAPITAL LETTER N WITH CARON        |
| Ƈ               | LATIN CAPITAL LETTER C WITH HOOK         |
| LJ               | LATIN CAPITAL LETTER LJ                  |
| ȇ               | LATIN SMALL LETTER E WITH INVERTED BREVE |
| ɇ               | LATIN SMALL LETTER E WITH STROKE         |
| ʇ               | LATIN SMALL LETTER TURNED T              |
| ˇ               | CARON                                    |
| ·               | GREEK ANO TELEIA                         |
| χ               | GREEK SMALL LETTER CHI                   |
| Ї               | CYRILLIC CAPITAL LETTER YI               |
| ч               | CYRILLIC SMALL LETTER CHE                |

Search your file for ao Su, maybe you can find it.

cp1252 is not available in MariaDB; perhaps you were using latin1, which is the 'same'? What version are you using?

Rick James
  • 135,179
  • 13
  • 127
  • 222