What is the meaning and difference between these queries?
SELECT U'String' FROM dual;
and
SELECT N'String' FROM dual;
What is the meaning and difference between these queries?
SELECT U'String' FROM dual;
and
SELECT N'String' FROM dual;
In this answer i will try to provide informations from official resources
N''
is used to convert a string to NCHAR
or NVARCHAR2
datatype
According to this Oracle documentation Oracle - Literals
The syntax of text literals is as follows:
where
N
orn
specifies the literal using the national character set (NCHAR
orNVARCHAR2
data).
Also in this second article Oracle - Datatypes
The N'String'
is used to convert a string to NCHAR
datatype
From the article listed above:
The following example compares the
translated_description
column of thepm.product_descriptions
table with a national character set string:SELECT translated_description FROM product_descriptions WHERE translated_name = N'LCD Monitor 11/PM';
U''
is used to handle the SQL NCHAR String Literals in Oracle Call Interface (OCI)
Based on this Oracle documentation Programming with Unicode
The Oracle Call Interface (OCI) is the lowest level API that the rest of the client-side database access products use. It provides a flexible way for C/C++ programs to access Unicode data stored in SQL
CHAR
andNCHAR
datatypes. Using OCI, you can programmatically specify the character set (UTF-8, UTF-16, and others) for the data to be inserted or retrieved. It accesses the database through Oracle Net.
OCI is the lowest-level API for accessing a database, so it offers the best possible performance.
Handling SQL NCHAR String Literals in OCI
You can switch it on by setting the environment variable
ORA_NCHAR_LITERAL_REPLACE
toTRUE
. You can also achieve this behavior programmatically by using theOCI_NCHAR_LITERAL_REPLACE_ON
andOCI_NCHAR_LITERAL_REPLACE_OFF
modes inOCIEnvCreate()
andOCIEnvNlsCreate()
. So, for example,OCIEnvCreate(OCI_NCHAR_LITERAL_REPLACE_ON)
turns onNCHAR
literal replacement, whileOCIEnvCreate(OCI_NCHAR_LITERAL_REPLACE_OFF)
turns it off.[...] Note that, when the
NCHAR
literal replacement is turned on,OCIStmtPrepare
andOCIStmtPrepare2
will transformN'
literals withU'
literals in the SQL text and store the resulting SQL text in the statement handle. Thus, if the application usesOCI_ATTR_STATEMENT
to retrieve the SQL text from theOCI
statement handle, the SQL text will returnU'
instead ofN'
as specified in the original text.
From datatypes perspective, there is not difference between both queries provided
N'string'
just returns the string
as NCHAR
type.
U'string'
returns also NCHAR
type, however it does additional processing to the string
: it replaces \\
with \
and \xxxx
with Unicode code point U+xxxx
, where xxxx
are 4 hexadecimal digits. This is similar to UNISTR('string')
, the difference is that the latter returns NVARCHAR2
.
U'
literals are useful when you want to have a Unicode string independent from encoding and NLS settings.
Example:
select n'\€', u'\\\20ac', n'\\\20ac' from dual;
N'\€' U'\\\20AC' N'\\\20AC'
----- ---------- ----------
\€ \€ \\\20ac
when using N'
we denote that given datatype is NCHAR
or NVARCHAR
.
U'
is used to denote unicode
The documented N''
literals are the same as standard character literals (''
) except that their data type is NVARCHAR2 and not VARCHAR2. It is important to note that the characters in these literals, together with the entire SQL statement, are converted from the client character set to the database character set when transmitted to the server. All characters from the literals that are not supported by the database character set are lost.
The data type of the undocumented U''
literals is also NVARCHAR2. The content of a U''
literal is interpreted like the input to the SQL UNISTR
function. That is, each character sequence \xxxx
, where each x
is one hex digit, is interpreted as a UTF-16 code point U+xxxx. I am not sure why the U''
literals are undocumented. I can only guess. They are used internally by the NCHAR literal replacement feature, which, when enable on a client, automatically translates N''
literals to U''
literals. This prevents the mentioned data loss due to character set conversion and enables literal Unicode data to be provided for NVARCHAR2 columns even if the database character set is not Unicode.
The two queries in this thread's question are generally not equivalent because the literal text would be interpreted differently. However, if no backslash is present in the literals, no difference can be observed.