22

I've read all about varchar versus nvarchar. But I didn't see an answer to what I think is a simple question. How do you determine the length of your nvarchar column? For varchar it's very simple: my Description, for example, can have 100 characters, so I define varchar(100). Now I'm told we need to internationalize and support any language. Does this mean I need to change my Description column to nvarchar(200), i.e. simply double the length? (And I'm ignoring all the other issues that are involved with internationalization for the moment.)

Is it that simple?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Kane Jeeves
  • 475
  • 2
  • 9
  • 19

3 Answers3

24

Generally it is the same as for varchar really. The number is still the maximum number of characters not the data length.

nvarchar(100) allows 100 characters (which would potentially consume 200 bytes in SQL Server).

You might want to allow for the fact that different cultures may take more characters to express the same thing though.

An exception to this is however is if you are using an SC collation (which supports supplementary characters). In that case a single character can potentially take up to 4 bytes.

So worst case would be to double the character value declared.

Martin Smith
  • 438,706
  • 87
  • 741
  • 845
  • 1
    Hmmm. On another MS site I did see this comment from an MS MVP which seems to contradict what you're saying: Note that nvarchar(50) does not mean 50 characters - it means 50 double-bytes. Thus, you can only store 25 characters that requires 4 bytes in nvarchar(50). I'm not saying you're wrong, just that I'm confused. – Kane Jeeves Jan 21 '14 at 13:51
  • 4
    @KaneJeeves - That is wrong. Easy to test yourself! `DECLARE @T TABLE (C NVARCHAR(5)); INSERT INTO @T VALUES ('Wrong');SELECT C, DATALENGTH(C) FROM @T`. Do you have a link? – Martin Smith Jan 21 '14 at 13:55
  • 2
    Awesome, thanks. Also proves the point that you can't always trust that someone with a certification is always right. – Kane Jeeves Jan 21 '14 at 13:58
  • @KaneJeeves To be fair to Erland he was talking about **4 byte** characters. [Ref](http://social.msdn.microsoft.com/Forums/sqlserver/en-US/918f7c8b-cc16-4c03-b4f9-f9b0660e0587/unicode-vs-ascii-difference-and-benefits?forum=transactsql). – Mikael Eriksson Jan 21 '14 at 14:03
  • @MikaelEriksson - utf-16? That's a new one to me. I thought SQL Server used UCS-2. Has it changed? – Martin Smith Jan 21 '14 at 14:06
  • 1
    New for me too, just found it googling. Apparently it is a collation thing where you use `_SC` on a windows collation. – Mikael Eriksson Jan 21 '14 at 14:08
  • @MikaelEriksson - Thanks. [Found the relevant BOL article now](http://technet.microsoft.com/en-us/library/ms180942(v=sql.105).aspx) – Martin Smith Jan 21 '14 at 14:14
  • No example. Documented here for SQL Server 2012.[Collation and Unicode Support](http://msdn.microsoft.com/en-us/library/ms143726(v=sql.110).aspx) – Mikael Eriksson Jan 21 '14 at 14:16
  • @MikaelEriksson - Yes I deleted that comment (Asking if any examples where `LEN=1` and `DATALENGTH=4`) as the link I found explicitly says that `LEN` doesn't work correctly for these. – Martin Smith Jan 21 '14 at 14:18
  • @MikaelEriksson - Thanks! Maybe this was improved in 2012 then. The article I found was 2008 R2. – Martin Smith Jan 21 '14 at 14:25
  • @KaneJeeves - The comment you originally found was entirely accurate. Thanks to Mikael for pointing it out. Incorporated into my answer now. – Martin Smith Jan 21 '14 at 14:32
  • So what does all this mean? @Martins original test would not work if my SQL used UTF-16? Since it does work my system uses UTF-8 or SQL just accounts for things correctly? – Kane Jeeves Jan 21 '14 at 14:33
  • @KaneJeeves - My test would be the same on all collations. In a `SC` collation unicode characters less than `0xFFFF` still only take 2 bytes. – Martin Smith Jan 21 '14 at 14:47
  • For following table CREATE TABLE [dbo].tblPlan( [ID] [nvarchar](10) NOT NULL, .................................. ............................ command Length of the column will be 20. – SKARVA Bodavula Jul 04 '17 at 13:30
  • 1
    How do you reconcile your answer ("The number is still the maximum number of characters, not the data length") with the [documentation](https://learn.microsoft.com/en-us/sql/t-sql/data-types/nchar-and-nvarchar-transact-sql?view=sql-server-ver15#remarks): "A common misconception is to think that `NCHAR(n)` and `NVARCHAR(n)`, the `n` defines the number of characters"? – OfirD Oct 06 '21 at 10:18
7

From microsoft web site:

A common misconception is to think that NCHAR(n) and NVARCHAR(n), the n defines the number of characters. But in NCHAR(n) and NVARCHAR(n) the n defines the string length in byte-pairs (0-4,000). n never defines numbers of characters that can be stored. This is similar to the definition of CHAR(n) and VARCHAR(n). The misconception happens because when using characters defined in the Unicode range 0-65,535, one character can be stored per each byte-pair. However, in higher Unicode ranges (65,536-1,114,111) one character may use two byte-pairs. For example, in a column defined as NCHAR(10), the Database Engine can store 10 characters that use one byte-pair (Unicode range 0-65,535), but less than 10 characters when using two byte-pairs (Unicode range 65,536-1,114,111). For more information about Unicode storage and character ranges, see

https://learn.microsoft.com/en-us/sql/t-sql/data-types/nchar-and-nvarchar-transact-sql?view=sql-server-ver15

Musa Caglar
  • 71
  • 1
  • 1
2

@Musa Calgar - exactly right. That link has the information for the answer to this question.

But to make sure the question itself is clear, we are talking about the 'length' attribute we see when we look at the column definition for a given table, right? That is the storage allocated per column. On the other hand, if we want to know the number of characters for a given string in the table at a given moment you can: "SELECT myColumn, LEN(myColumn) FROM myTable"

But if the storage length is desired, you can drag the table name into the query window using SSMS, highlight it, and use 'Alt-F1' to see the defined lengths of each column.

So as an example, I created a table like this specifiying collations. (Latin1_General_100_CI_AS_SC allows for supplemental characters - that is, characters that take more than just 2 bytes):

CREATE TABLE [dbo].[TestTable1](
    [col1] [varchar](10)  COLLATE Latin1_General_100_CI_AS,
    [col2] [nvarchar](10) COLLATE Latin1_General_100_CI_AS_SC,
    [col3] [nvarchar](10) COLLATE Latin1_General_100_CI_AS
) ON [PRIMARY]

The lengths show up like this (Highlight in query window and Alt-F1):

Column_Name    Type        Length  [...] Collation

col1           varchar      10           Latin1_General_100_CI_AS
col2           nvarchar     20           Latin1_General_100_CI_AS_SC
col3           nvarchar     20           Latin1_General_100_CI_AS

If you insert ASCII characters into the varchar and nvarchar fields, it will allow you to put 10 characters into all of them. There will be an error if you try to put more than 10 characters into those fields:

"String or binary data would be truncated. The statement has been terminated."

If you insert non-ASCII characters like 'ā' you can still put 10 of them into each one, but SQL Server will convert the values going into col1 to the closest known character that fits into 1-byte. In this case, 'ā' will be converted to 'a'.

However, if you insert characters that require 4 bytes to store, like for example, '', you will only be allowed to put FIVE of them into the varchar and nvarchar fields. Any more than that will result in the truncation error shown above. The varchar field will show question marks because it has no single-byte character that it can convert that input to.

So when you insert five of these '', do a select of that row using len(<colname>) and you will see this:

col1          len(col1)    col2          len(col2)      col3           len(col3)
??????????    10                5                    10

So the length of col2 shows 5 characters since supplemental characters were defined when the table was created (see above CREATE TABLE DDL statement). However, col3 did not have _SC for its collation, so it is showing length 10 for the five characters we inserted. Note that col1 has ten question marks. If we had defined the col1 varchar using the _SC collation instead of the non-supplemental one, it would behave the same way.

Gaius Gracchus
  • 111
  • 1
  • 3