0

I'm having terrible trouble trying to get Unicode (MS-SQL: nvarchar(length)) data out of an Oracle varchar2 column.

The connection is through the MS Oracle driver. Source/dest looks like this in the SSIS Toolbox:

Oracle SSIS Toolbox items

I thought I would simply go into the Source component's Advanced Editor, and change both the External and Output columns to type DT_WSTR. No go. As soon as I click OK, and then go back into Adv. Editor - it's flipped back to DT_STR. I don't remember any other source behaving this badly. Note that this flip happens before execution or even validation. ValidateExternalMetaData On or Off makes no difference.

Looking at this answer I found out the Oracle server's NLS_CHARACTERSET setting, and the one in the local machine's registry:

(I ran this against the Oracle server:

SELECT parameter, value FROM nls_database_parameters WHERE parameter = 'NLS_CHARACTERSET';

)

Oracle server: AL32UTF8 Local machine (registry): AMERICAN_AMERICA.WE8MSWIN1252

then tried some conversions in the Oracle SQL in the source. This:

CONVERT(THECOLUMN,'WE8MSWIN1252','AL32UTF8') AS THECOLUMN,

had no effect.

This:

CONVERT(THECOLUMN,'UTF8','AL32UTF8') AS THECOLUMN,

produced a weird effect. The non-English characters (specifically Polish Ł) came through perfectly in the Preview. But as soon as the data left the source (looked in Data Viewer), it was back to "Some characters [ASCII 26 instead of Ł] some more characters". And the SSIS datatype was still DT_STR, and refused to change to DT_WSTR.

I'm stuck. Can anyone help?

EDIT: more information Thank you all for comments.

Context I'm dealing with an existing ETL installation of 400 packages, so changing the driver will be difficult. For "reasons", I cannot even access the Oracle DAS except on one server (literally, remote desktop to that server). So fiddling about with server registry settings/language etc isn't possible without DBA/sysadmin involvement. Driver information: Control Panel Software detail and in ODBC drivers: ODBC Oracle drivers

The list of possible drivers in the referenced other question isn't an exact match for my situation, as I'm trying to connect through SSIS, not raw .NET. What I'm trying to do is simply: get Unicode character data from an Oracle table (OK, it's probably a view, but that's beyond my reach), through SSIS, into an MS-SQL nvarchar column.

The data in Oracle is clearly Unicode of some sort (see my test with the Preview above). The SSIS Source component, however, clearly thinks that it's DT_STR (SSIS datatype corresponding to MS-SQL non-Unicode varchar). And refuses to accept a "hard override" of this column's type through the Advanced Editor. So while in the Preview (presumably, before the Source Component actually gets the data) the data is correct (with Polish characters), once it hits the Source Component it has already been corrupted down to "replace anything non-ASCII with ASCII(26)".

Running this as suggested by Wernfried:

SELECT TheColumn,DUMP(TheColumn,1016)

gives me this - again, only in the Preview, not in the data which flows out of the Source through SSIS:

PLUATREGŁ Typ=1 Len=10 Character Set=AL32UTF8: 50,4c,55,41,54,52,45,47,c5,81

Guessing at what this means: If I do MS-SQL

SELECT CHAR(CONVERT(int,0x50))

for each of these hex values, the characters makes sense, until C5 ("Å") and 81 (""). Perhaps the encoding used by the Oracle server is different from that on the local SQL Server.

But my main problem is getting anything other than non-Unicode text through into SSIS.

SebTHU
  • 1,385
  • 2
  • 11
  • 22
  • 1
    What is the Oracle database character set (not NLS_LANG; `select value from nls_database_parameters where parameter='NLS_CHARACTERSET'`)? And does changing your local machine setting to `AMERICAN_AMERICA.AL32UTF8` make any difference? (Does SSIS actually use that setting?) – Alex Poole Oct 20 '22 at 17:42
  • 1
    Not fully clear what you try to achieve. Usually `CONVERT` is **only** used when you inserted data with a misconfigured client and you like to repair such data. You said, "characters came through perfectly in the preview" thus `CONVERT` should be wrong in any case. – Wernfried Domscheit Oct 20 '22 at 17:43
  • Which exact driver do you use? Most of the Microsoft drivers (see [How to connect to Oracle 11 database from . net](https://stackoverflow.com/questions/34803106/how-to-connect-to-oracle-11-database-from-net/34805999#34805999)) are deprecated and have problems with non-ASCII characters. – Wernfried Domscheit Oct 20 '22 at 17:45
  • Perhaps you are not fully aware what `NLS_LANG` actually does. By setting `NLS_LANG=.WE8MSWIN1252` you tell the Oracle database "my client (in your case: my SSIS) uses characterset Win1252" - no more, no less! Setting `NLS_LANG` does not change the SSIS characterset, and setting SSIS characterset does not change the `NLS_LANG` value. See also http://stackoverflow.com/questions/33783902/odbcconnection-returning-chinese-characters-as/33790600#33790600 – Wernfried Domscheit Oct 20 '22 at 17:51
  • What do you get from `SELECT DUMP(..., 1016) FROM ...`? – Wernfried Domscheit Oct 20 '22 at 17:57
  • "UTF8" is the Oracle name for [CESU-8](https://en.wikipedia.org/wiki/CESU-8). As long you stay in the [BMP](https://en.wikipedia.org/wiki/Plane_(Unicode)#Basic_Multilingual_Plane) (characters less than `U+​FFFF` there is no difference to "AL32UTF8", the Oracle name for "real" UTF-8 – Wernfried Domscheit Oct 20 '22 at 18:01
  • @WernfriedDomscheit - edited the question with more information. – SebTHU Oct 21 '22 at 08:59

2 Answers2

1

The result from DUMP is perfect, certainly you don't need any CONVERT.

I think the problem is the ancient and deprecated ODBC driver "Microsoft ODBC for Oracle", it was based on Oracle 7.3x!

See ODBC Driver for Oracle

The ODBC Driver for Oracle does not support any of the new Oracle8 data types - Unicode data types, BLOBs, CLOBs, and so on

The second driver "Microsoft Oracle ODBC Driver" I have never seen before. According to Driver history for Microsoft SQL Server it is also deprecated.

You should use the ODBC driver provided by Oracle, you can download the "ODBC Pacakge" from Oracle Instant Client Downloads for Microsoft Windows 32-bit or Oracle Instant Client Downloads for Microsoft Windows (x64) 64-bit

Wernfried Domscheit
  • 54,457
  • 9
  • 76
  • 110
  • Hi - and thanks! Unfortunately, I can't do anything about the drivers - like I said, this is an ETL installation with 400+ packages. And the Oracle side is all tangled up with IBM HPCs... But I'll bear this in mind if we ever get to a position to look at such a big change. – SebTHU Nov 08 '22 at 12:48
0

The answer, oddly enough, was simply this, in the Oracle SQL:

Cast(TheMisBehavingColumn AS NVARCHAR2(240)) AS TheMisbehavingColumn

This produces an (in MS-SQL/SSIS terms) nvarchar (DT_WSTR) column. Why I have to CAST a column which is already NVARCHAR2 to NVARCHAR2, I just don't know.

SebTHU
  • 1,385
  • 2
  • 11
  • 22