characters appearing incorrectly even with Unicode source and destination (SSIS)

Question

I am having a codepage unicode/non unicode problem and need expertise to understand it.

In SSIS I am reading data in from a UTF8 encoded text file. The datatypes are all DT_WSTR (unicode string). The destination is NVARCHAR which is also unicode.

Non standard characters such as Ú are not being encoded correctly )appearing as a black box question mark).

If the character appears correctly in the input file, the source is set to DT_WSTR & the destination is nvarchar, why is the character not rendering correctly?

I have tried setting the codepage of the source column to 65001, but in SSIS its only possible to change the codepage on a STR (non unicode) type.

Id appreciate any help in understanding why all unicode fields still cant store a unicode value correctly.

Update from the OP comments

It seems my output is ok if i use Unicode types end to end (input is DT_WSTR, destination column is nvarchar & when extracting again to text, output column is DW_WSTR. The only issue is sql server management studio, which does not seem to be able to render unicode characters correctly in the results of a query, when setting output to grid or text. this is a red herring and the process overall works without issue if this is ignored

UTF-8 isn't supported by SQL Server until SQL Server 2019 (currently in preview). `nvarchar` <> UTF-8 I'm afraid. You'll need to change the encode of the file. — Thom A, Feb 08 '19 at 11:27
When Windows talks of Unicode it normally means UTF-16LE try converting your input file to that encoding before uploading. — JGNI, Feb 08 '19 at 11:34
Don't set the code page at the column level. Make sure the [flat file source code page is set to 65001 (UTF-8)](https://stackoverflow.com/questions/5322642/ssis-flat-file-always-ansi-never-utf-8-encoded). — Dan Guzman, Feb 08 '19 at 11:44
You can load a UTF-8 file into SQL Server and have been able to do so for a while. Take a look at this link: https://www.mssqltips.com/sqlservertip/3119/import-utf8-unicode-special-characters-with-sql-server-integration-services/ and double check your settings on the source. It might still be easier to convert to UTF-16 though. — Apep, Feb 08 '19 at 11:47
Update on this: It seems my output is ok if i use Unicode types end to end (input is DT_WSTR, destination column is nvarchar & when extracting again to text, output column is DW_WSTR. The only issue is sql server management studio, which does not seem to be able to render unicode characters correctly in the results of a query, when setting output to grid or text. this is a red herring and the process overall works without issue if this is ignored. — wilson_smyth, Feb 08 '19 at 11:54

Hadi · Answer 1 · 2019-02-10T15:19:08.830

Trying to figure out the issue

There is not problem importing unicode characters from flat files to SQL Server destination, the only thing you have to do is the set the flat file encoding as unicode, and the result columns must be NVARCHAR. Based on your question, it looks like you have met the requirements so i can say that:

Unicode Character are imported successfully to SQL Server, but for some reasons SQL Server Management Studio cannot show unicode characters in a grid Results, to check that data is imported correctly, change change the result view to Result To Text.

GoTo Tools >> Options >> Query Results >> Results To Text

In the second reference link i provided they mentioned that:

If you use SSMS for your queries, change to output type from "Grid" to "Text", because depending on the font the grid can't show unicode.

Or you can try to change the Grid Results font, (on my machine, i use Tahoma font and it shows unicode characters normally)

Experiments

You can perform the following test (taken from the links below)

SET NOCOUNT ON;



CREATE TABLE #test

  (  id int IDENTITY(1, 2) NOT NULL Primary KEY

    ,Uni nvarchar(20) NULL);

INSERT INTO #test (Uni) VALUES (N'DE: äöüßÖÜÄ');

INSERT INTO #test (Uni) VALUES (N'PL: śćźłę');

INSERT INTO #test (Uni) VALUES (N'JAP: 言も言わずに');

INSERT INTO #test (Uni) VALUES (N'CHN: 玉王瓜瓦甘生用田由疋');




SELECT * FROM #test;


GO

DROP TABLE #test;

Try the following query using Result as Grid and Result as Text options.

References

1

Astonishing as usual Mr. Hadi ! – amd Feb 16 '19 at 19:04

characters appearing incorrectly even with Unicode source and destination (SSIS)

Update from the OP comments

1 Answers1

Trying to figure out the issue

Experiments

References

Linked