1

I am importing data from a MySQL Database into a Microsoft SQL Server Database (SQL_Latin1_General_CP1_CI_AS). I use a linkedserver connection based on OBDC in combination with an Openquery statement.

I use the following MS SQL Statement:

 SELECT street_address FROM openquery(MYLINKEDSERVER,'SELECT street_address FROM customers')

Some of the addresses contain Chinese characters, when using openquery these are imported as question marks, example result:

??? ??? 74 ?????501?

I tried converting using latin1 - no luck;

 SELECT * FROM openquery(MYLINKEDSERVER,'SELECT convert(street_address using latin1) FROM customers')

Any ideas how to import the Chinese characters into MS SQL?

Evaldas Buinauskas
  • 13,739
  • 11
  • 55
  • 107
phicon
  • 3,549
  • 5
  • 32
  • 62
  • 1
    What is the datatype of `street_address` is it `nvarchar` ? – ahmed abdelqader Jan 03 '17 at 11:33
  • the datatype in mysql is varchar – phicon Jan 03 '17 at 12:06
  • and in SQL Server ?, it should be `nvarchar` – ahmed abdelqader Jan 03 '17 at 12:30
  • 1
    You can't read Chinese characters with a Latin1 codepage! What is the type of the field? It should be UTF8. Also, how did you set up the linked server? What are the connection settings? Is codepage translation activated? – Panagiotis Kanavos Jan 03 '17 at 13:15
  • What is the *collation* of the field in MySQL? All text fields are `varchar` in MySQL, with Unicode marked as UTF8 – Panagiotis Kanavos Jan 03 '17 at 13:18
  • in MySQL it is UTF8 - codepage translation is not activated - i will look into this – phicon Jan 03 '17 at 13:48
  • Are you using the Unicode or ANSI ODBC driver? OLEDB drivers handle translation automatically. With ODBC though, you'd have to use the Unicode driver, otherwise you'd only get ANSI back, no matter the encoding. Check [this](https://dev.mysql.com/doc/connector-odbc/en/connector-odbc-installation.html#connector-odbc-unicode-ansi) on the driver docs and [this question](http://stackoverflow.com/questions/25889046/difference-between-ansi-and-unicode-drivers-of-mysql) about the differences between the two drivers – Panagiotis Kanavos Jan 03 '17 at 14:03

2 Answers2

2

There are two ODBC drivers, one ANSI (myodbc5a.dll) and one Unicode (myodbc5w.dll). The ANSI driver will always return varchar data. You need to use the Unicode driver in order to return Chinese characters as Unicode. 

Check the relevant paragraph on the ODBC driver's installation page and this related question.

When converting from one codepage to another, unknown or unrepresentable characters are replaced with ? or the Unicode Replacement character �. Such characters in a string are a sure sign that a wrong conversion was attempted. 

Community
  • 1
  • 1
Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236
0

So i managed to make it work.

  1. convert to binary in MySQL
  2. Import in SQL Server through openquery
  3. cast to varchar(max) in SQL Server
  4. convert varchar UTF8 to Nvarchar using function --> link

result query;

SELECT dbo.func_utf_string(cast(bin AS varchar(MAX))) from openquery(MYLINKEDSERVER,'SELECT cast(street as binary) as bin from customers')
Community
  • 1
  • 1
phicon
  • 3,549
  • 5
  • 32
  • 62
  • How about casting to UTF8 on MySQL's side and leave it as is on SQL Server, ie `nvarchar`? Your original query used `latin1`. If the cast to bin worked, that's because the data was *NOT* Latin1 but UTF8. The driver knows how to transale UTF8 to Unicode. – Panagiotis Kanavos Jan 03 '17 at 13:11
  • The driver also knows how to translate *any* codepage to Unicode, provided the correct one is used at the source. When the wrong codepage is used, invalid characters are replaced with `?`. If, for example, Chinese is stored in a table with a `Latin1` collation, you have a broken conversion.the source column is *not* UTF8, you also broke the conversion. – Panagiotis Kanavos Jan 03 '17 at 13:17
  • if i leave it as UTF8, the select returns varchar which does not support the Chinese chars. Codepage translation in the Linked Server is something i have not tested but will look into. TY – phicon Jan 03 '17 at 13:51
  • That's why I asked about the connection settings in the comments. The ADO.NET/OLEDB drivers have an auto translation setting that converts between encodings and Unicode. The ODBC driver on the other hand comes in Unicode (myodbc5w.dll) and ANSI versions (myodbc5a.dll). – Panagiotis Kanavos Jan 03 '17 at 13:59
  • Chaged the odbc driver to Unicode, this did the trick. Mark as answer and ill accept. TY – phicon Jan 03 '17 at 15:10