Sql Server - does VARCHAR length indicate max number of characters or number of bytes?

Question

For a SQL Server column of type

VARCHAR(100)

Before I insert into the column, I want to make sure that the value being inserted is not greater than 100. Is that number specifying the max number of characters it can store, or is it the number of bytes of character data it can store?

The reason I'm asking is that some unicode special characters use more than one byte. Therefore, a 100 character string encoded in unicode could take up more than 100 bytes.

Since varchar is for ASCII encoding, is it possible for any ASCII character to take up more than one byte (which might require checking the byte length)?

(EDIT: Based on feedback I got on the question, I see that varchar should be used for ASCII and nvarchar for unicode.)

If you're dealing with unicode then you should be using `nvarhcar` instead. — juharr, Dec 14 '17 at 03:32
It's unclear what you mean by "some special characters use more than one byte", as that isn't the case for [VARCHAR](https://learn.microsoft.com/en-us/sql/t-sql/data-types/nchar-and-nvarchar-transact-sql), which is designed for ASCII characters. — Joe Sewell, Dec 14 '17 at 04:10

John Wu · Accepted Answer · 2017-12-14T04:08:13.673

7

Use GetByteCount for the appropriate encoder-- in this case, ASCII for VarChar and Unicode for NVarChar).

    var s = "This is a string";
    var len1 = s.Length;
    var len2 = System.Text.Encoding.Unicode.GetByteCount(s);
    var len3 = System.Text.Encoding.ASCII.GetByteCount(s);
    Console.WriteLine("'{0}' has {1} characters and is {2} bytes with Unicode encoding and {3} bytes with ASCII encoding.", s, len1, len2, len3);

Output:

'This is a string' has 16 characters and is 32 bytes with Unicode encoding and 16 bytes with ASCII encoding.

edited Dec 14 '17 at 04:08

answered Dec 14 '17 at 03:40

John Wu

50,556
8
44
80

1

This is the appropriate encoder for [**N**VARCHAR](https://learn.microsoft.com/en-us/sql/t-sql/data-types/nchar-and-nvarchar-transact-sql), not plain [VARCHAR](https://learn.microsoft.com/en-us/sql/t-sql/data-types/char-and-varchar-transact-sql). For VARCHAR, `System.Text.Encoding.ASCII` should be used instead. – Joe Sewell Dec 14 '17 at 04:05
2

Revised. Didn't read the question carefully enough. I can't think of any multibyte ASCII characters, so I'm surprised that is what OP is asking about. – John Wu Dec 14 '17 at 04:08
Thanks @JohnWu, this helps a lot. Follow up question - there may not be any multibyte ASCII characters, but what happens then if multibyte unicode characters are inserted into a `varchar` column? – Josh Withee Dec 21 '17 at 18:00
You can try to insert unicode characters into a plain `varchar`, but any character that is not on the code page will end up as question marks. Not recommended. – John Wu Dec 22 '17 at 06:47

score 1 · Answer 2 · answered Dec 27 '17 at 14:15

I learned something from researching this question!

In SQL Server

nvarchar takes double the storage because it uses a two byte character set UNICODE UCS-2.

n defines the string length ... The storage size, in bytes, is two times the actual length of data entered + 2 bytes.

This tells me that the length specified for nvarchar is most certainly the number of characters, not the bytes.

varchar Is one byte for one character storage, and stores single byte non-unicode character data.

n defines the string length ... The storage size is the actual length of the data entered + 2 bytes.

I would infer from those two statements that the number indicated for the length of the varchar or nvarchar column is indeed the number of characters.

The phrase length of the data entered is somewhat ambiguous, but from the two descriptions I think it's reasonable to conclude that they mean the number of characters entered.

If you have the potential for receiving and storing two byte character data, always choose nvarchar over varchar even though the performance may take a hit. The linked question and answers are helpful to see why.

The bottom line is that SQL Server is expressing the length of the varchar and nvarchar column as the number of characters entered. It will take care of the storage for you. Don't worry about bytes!

NOTE: Adding to the confusion is that Oracle allows you specify either byte length or character length in the native type VARCHAR2:

Oracle VARCHAR2

With the increasing use of multi-byte character sets to support globalized databases comes the problem of bytes no longer equating to characters.

The VARCHAR2 and CHAR types support two methods of specifying lengths:

In bytes: VARCHAR2(10 byte). This will support up to 10 bytes of data, which could be as few as two characters in a multi-byte character sets. In characters: VARCHAR2(10 char). This will support to up 10 characters of data, which could be as much as 40 bytes of information.

And it appears that the default is bytes!

This seems to be creating confusing for more than just us:

Oracle varchar2 - bytes or chars

So if you're coming from an Oracle world, you might assume this is true everywhere. And if you're coming from a SQL Server world, you might not realize this is the case!

In SQL Server

The thing that confuses me is that UTF-8 unicode characters can take up to 6 bytes, and many take as few as 1 byte! And yet, the docs say it each character takes exactly two bytes.

So really... How many bytes does one Unicode character take?

Answer: SQL Server is using UNICODE UCS-2, which

uses a single code value (defined as one or more numbers representing a code point) between 0 and 65,535 for each character, and allows exactly two bytes (one 16-bit word) to represent that value.

Which explains why SQL Server can have a specific amount of space the character string will take based on the length. ALL characters take two bytes in an nvarchar column!

score 0 · Answer 3 · answered Dec 14 '17 at 04:17

The correct way to do is to check length

if (myString.Length > 100)
{
    MessageBox.Show("String too long");
    return; 
}

.NET string is already Unicode and can hold whatever characters you have. Contrary, varchar datatype is just and ASCII-type and will not support Unicode. What you need to do is to declare db column as nvarchar, and everything will work smooth. Don't do anything crazy.

Sql Server - does VARCHAR length indicate max number of characters or number of bytes?

3 Answers3