I'm having terrible trouble trying to get Unicode (MS-SQL: nvarchar(length)) data out of an Oracle varchar2 column.
The connection is through the MS Oracle driver. Source/dest looks like this in the SSIS Toolbox:
I thought I would simply go into the Source component's Advanced Editor, and change both the External and Output columns to type DT_WSTR. No go. As soon as I click OK, and then go back into Adv. Editor - it's flipped back to DT_STR. I don't remember any other source behaving this badly. Note that this flip happens before execution or even validation. ValidateExternalMetaData On or Off makes no difference.
Looking at this answer I found out the Oracle server's NLS_CHARACTERSET setting, and the one in the local machine's registry:
(I ran this against the Oracle server:
SELECT parameter, value FROM nls_database_parameters WHERE parameter = 'NLS_CHARACTERSET';
)
Oracle server: AL32UTF8 Local machine (registry): AMERICAN_AMERICA.WE8MSWIN1252
then tried some conversions in the Oracle SQL in the source. This:
CONVERT(THECOLUMN,'WE8MSWIN1252','AL32UTF8') AS THECOLUMN,
had no effect.
This:
CONVERT(THECOLUMN,'UTF8','AL32UTF8') AS THECOLUMN,
produced a weird effect. The non-English characters (specifically Polish Ł) came through perfectly in the Preview. But as soon as the data left the source (looked in Data Viewer), it was back to "Some characters [ASCII 26 instead of Ł] some more characters". And the SSIS datatype was still DT_STR, and refused to change to DT_WSTR.
I'm stuck. Can anyone help?
EDIT: more information Thank you all for comments.
Context
I'm dealing with an existing ETL installation of 400 packages, so changing the driver will be difficult.
For "reasons", I cannot even access the Oracle DAS except on one server (literally, remote desktop to that server). So fiddling about with server registry settings/language etc isn't possible without DBA/sysadmin involvement. Driver information:
and in ODBC drivers:
The list of possible drivers in the referenced other question isn't an exact match for my situation, as I'm trying to connect through SSIS, not raw .NET. What I'm trying to do is simply: get Unicode character data from an Oracle table (OK, it's probably a view, but that's beyond my reach), through SSIS, into an MS-SQL nvarchar column.
The data in Oracle is clearly Unicode of some sort (see my test with the Preview above). The SSIS Source component, however, clearly thinks that it's DT_STR (SSIS datatype corresponding to MS-SQL non-Unicode varchar). And refuses to accept a "hard override" of this column's type through the Advanced Editor. So while in the Preview (presumably, before the Source Component actually gets the data) the data is correct (with Polish characters), once it hits the Source Component it has already been corrupted down to "replace anything non-ASCII with ASCII(26)".
Running this as suggested by Wernfried:
SELECT TheColumn,DUMP(TheColumn,1016)
gives me this - again, only in the Preview, not in the data which flows out of the Source through SSIS:
PLUATREGŁ Typ=1 Len=10 Character Set=AL32UTF8: 50,4c,55,41,54,52,45,47,c5,81
Guessing at what this means: If I do MS-SQL
SELECT CHAR(CONVERT(int,0x50))
for each of these hex values, the characters makes sense, until C5 ("Å") and 81 (""). Perhaps the encoding used by the Oracle server is different from that on the local SQL Server.
But my main problem is getting anything other than non-Unicode text through into SSIS.