23

We are migrating some data from sql server to oracle. For columns defined as NVARCHAR in SQL server we started creating NVARCHAR columns in Oracle thinking them to be similar..But it looks like they are not.

I have read couple of posts on stackoverflow and want to confirm my findings.

Oracle VARCHAR2 already supports unicode if the database character set is say AL32UTF8 (which is true for our case).

SQLServer VARCHAR does not support unicode. SQLServer explicitly requires columns to be in NCHAR/NVARCHAR type to store data in unicode (specifically in the 2 byte UCS-2 format)..

Hence would it be correct to say that SQL Server NVARCHAR columns can/should be migrated as Oracle VARCHAR2 columns ?

Zenil
  • 1,491
  • 3
  • 12
  • 21
  • 1
    Microsoft confirms Oracle NVARCHAR2 <==> SQLServer NVARCHAR..But what I wanted to confirm was Oracle VARCHAR2 <==> SQLServer NVARCHAR ? (as oracle VARCHAR2 is unicode ready) – Zenil Aug 20 '13 at 00:31

1 Answers1

39

Yes, if your Oracle database is created using a Unicode character set, an NVARCHAR in SQL Server should be migrated to a VARCHAR2 in Oracle. In Oracle, the NVARCHAR data type exists to allow applications to store data using a Unicode character set when the database character set does not support Unicode.

One thing to be aware of in migrating, however, is character length semantics. In SQL Server, a NVARCHAR(20) allocates space for 20 characters which requires up to 40 bytes in UCS-2. In Oracle, by default, a VARCHAR2(20) allocates 20 bytes of storage. In the AL32UTF8 character set, that is potentially only enough space for 6 characters though most likely it will handle much more (a single character in AL32UTF8 requires between 1 and 3 bytes. You probably want to declare your Oracle types as VARCHAR2(20 CHAR) which indicates that you want to allocate space for 20 characters regardless of how many bytes that requires. That tends to be much easier to communicate than trying to explain why some 20 character strings are allowed while other 10 character strings are rejected.

You can change the default length semantics at the session level so that any tables you create without specifying any length semantics will use character rather than byte semantics

ALTER SESSION SET nls_length_semantics=CHAR;

That lets you avoid typing CHAR every time you define a new column. It is also possible to set that at a system level but doing so is discouraged by the NLS team-- apparently, not all the scripts Oracle provides have been thoroughly tested against databases where the NLS_LENGTH_SEMANTICS has been changed. And probably very few third-party scripts have been.

Justin Cave
  • 227,342
  • 24
  • 367
  • 384
  • Great answer...I had couple of questions..When we actually do the migration should we be concerned about data truncation issues ? Assume we create all the Oracle columns as VARCHAR2. Then any data from a SQL Server VARCHAR column should migrate correctly. What about data from a SQL Server NVARCHAR coumn ? SQL Server NVARCHAR stores data in UTF-16 while oracle VARCHAR2 is UTF-8 . The migration tool should somehow take care of this ? Please add your thoughts to the main answer.. – Zenil Aug 20 '13 at 18:30
  • 1
    @Zenil - I believe I've already covered this in my answer. Assuming that you use character length semantics when you are defining the columns in Oracle, your Oracle `varchar2(20 char)` and your SQL Server `nvarchar(20)` will each have space for 20 characters. If they each have space for 20 characters, you shouldn't need to worry about truncation issues. – Justin Cave Aug 20 '13 at 18:34
  • I think you addressed the truncation issue but not the encoding issue. SQL server NVARCHAR column is encoded in UTF-16 while the oracle VARCHAR2 column will be encoded in UTF-8 ..So I guess the migration tool should be aware of this fact and do the proper transformations. I should find this out when we reach that stage.. – Zenil Aug 20 '13 at 18:40
  • 3
    @Zenil - The internal encoding doesn't come in to play. Well, there are different revisions of the Unicode standard that have expanded the number of characters that are defined over time so if you happen to go from a database that is supporting Unicode 6.2 where some character is defined to database that supports an earlier version of that standard where that character doesn't exist, you'd have problems but those would occur regardless of whether the data was stored in UTF-8 or UTF-16 or UTF-32 or USC-2. This generally isn't a practical concern, though. – Justin Cave Aug 20 '13 at 18:51
  • 4
    "`In SQL Server, a NVARCHAR(20) allocates space for 20 characters`" : @JustinCave , This is not correct. The number 20 does not represent the number of characters but "the string size in byte-pairs". This is very confusing, and many people make this mistake, since if you use characters in the range of code points 0-65535 then each character size is 2 bytes and the number is equal the number of characters. **But** if you use characters in the range of 65536-1114111 then each character size is 4 bytes and using NVARCHAR(20) allocates space only for 10 characters. – Ronen Ariely Aug 05 '19 at 03:09