I learned something from researching this question!
In SQL Server
- nvarchar takes double the storage because it uses a two byte character set
UNICODE UCS-2
.
n defines the string length ... The storage size, in bytes, is two times the actual length of data entered + 2 bytes.
This tells me that the length specified for nvarchar
is most certainly the number of characters, not the bytes.
- varchar Is one byte for one character storage, and stores single byte non-unicode character data.
n defines the string length ... The storage size is the actual length of the data entered + 2 bytes.
I would infer from those two statements that the number indicated for the length of the varchar
or nvarchar
column is indeed the number of characters.
The phrase length of the data entered
is somewhat ambiguous, but from the two descriptions I think it's reasonable to conclude that they mean the number of characters entered.
If you have the potential for receiving and storing two byte character data, always choose nvarchar over varchar even though the performance may take a hit. The linked question and answers are helpful to see why.
The bottom line is that SQL Server is expressing the length of the varchar
and nvarchar
column as the number of characters entered. It will take care of the storage for you. Don't worry about bytes!
NOTE: Adding to the confusion is that Oracle allows you specify either byte length or character length in the native type VARCHAR2:
Oracle VARCHAR2
With the increasing use of multi-byte character sets to support
globalized databases comes the problem of bytes no longer equating to
characters.
The VARCHAR2 and CHAR types support two methods of specifying lengths:
In bytes: VARCHAR2(10 byte). This will support up to 10 bytes of data,
which could be as few as two characters in a multi-byte character
sets. In characters: VARCHAR2(10 char). This will support to up 10
characters of data, which could be as much as 40 bytes of information.
And it appears that the default is bytes!
This seems to be creating confusing for more than just us:
Oracle varchar2 - bytes or chars
So if you're coming from an Oracle world, you might assume this is true everywhere. And if you're coming from a SQL Server world, you might not realize this is the case!
In SQL Server
The thing that confuses me is that UTF-8
unicode characters can take up to 6 bytes, and many take as few as 1 byte! And yet, the docs say it each character takes exactly two bytes.
So really... How many bytes does one Unicode character take?
Answer: SQL Server is using UNICODE UCS-2, which
uses a single code value (defined as one or more numbers representing
a code point) between 0 and 65,535 for each character, and allows
exactly two bytes (one 16-bit word) to represent that value.
Which explains why SQL Server can have a specific amount of space the character string will take based on the length. ALL characters take two bytes in an nvarchar column!