SQL Server - DB Impact on VARCHAR and NVARCHAR

Question

SQL Server gives super easy options for developers to create VARCHAR and NVARCHAR columns. However, I see for the same data stored in the column, the SQL Server database size is almost double.

Of course a little exaggerating but the database size is really high. I did look at the SQL Server MDB size and also checked the table size with VARCHAR column and same table with NVARCHAR column.

Does that mean that we need to use NVARCHAR sensibly?

You need to use all column types sensibly. NVARCHAR is 2x the size of VARCHAR because that's literally the definition. Varchar uses 1 byte, nvarchar uses 2 (for bigger unicode data) https://dba.stackexchange.com/questions/36081/write-differences-between-varchar-and-nvarchar Do you need to store unicode data? If not, then sensibly choose varchar. — Jacob H, Jun 07 '17 at 14:40
If you're not using unicode, then there's no reason to use NVARCHAR. — SS_DBA, Jun 07 '17 at 14:41
`Does that mean that we need to use NVARCHAR sensibly?` <= **Yes and Always**. Like any types use the one that best fits the business use. Otherwise there would only be 4 or 5 types ever because why even bother? What you choose and the size of the column depends on what you will be storing. Storing names? I recommend nvarchar(500) because names could contain unicode characters (*500 is up for debate*). Storing US State codes? Then use char(2) because that is a defined and known list that will never have unicode characters and always have a length of 2. — Igor, Jun 07 '17 at 14:47
@JacobH that's not the definition. N means Unicode. .NET uses Unicode. So do all web sites. It doesn't make sense to use varchar unless you want to deal with conversion erros. Besides, SQL Server 2016 provides compression even in LocalDB — Panagiotis Kanavos, Jun 07 '17 at 14:47
I'm not going to argue semantics with a random person on the internet. You just repeated what I said with different words. — Jacob H, Jun 07 '17 at 14:48
@WEI_DBA which doesn't actually happen. Some legacy databases may get away with using codepages, if and only if they NEVER accept data from other languages. — Panagiotis Kanavos, Jun 07 '17 at 14:49
@JacobH actually, you are wrong in several points. And that random person used to be a SQL Server MVP. UTF16 is *not* strictly 2 bytes. It can use more than 2. And getting away with varchar works only in controlled environments. Web sites aren't. — Panagiotis Kanavos, Jun 07 '17 at 14:50
@JacobH in fact that "definition" has changed from one SQL Server version to another. It started as UCS2, a subset of UTF16. Now [it's UTF16 LE](https://stackoverflow.com/questions/9008043/ucs-2-and-sql-server) — Panagiotis Kanavos, Jun 07 '17 at 14:54
I too can copy and paste from the documentation. "The storage size, in bytes, is two times the actual length of data entered + 2 bytes." So by "can use more than 2 bytes" you really meant "uses 2x +2 bytes"? We aren't even touching the fact that you assumed the OP is doing a web application when it wasn't even mentioned. — Jacob H, Jun 07 '17 at 15:03
The application that I have has lot of fields on the text box and users is allowed to copy and paste anything (including non-ascii characters), if they do so VARCHAR throws errors and that's the real problem. — Anand, Jun 09 '17 at 05:02

score 0 · Answer 1 · answered Jun 07 '17 at 15:20

0

This depends on your requirements.

If you need to store unicode then you have to use NVARCHAR. If you only store ascii or only one language which can stored in 8bit, then you can use NVARCHAR. if it take "too much" space then use VARCHAR.

answered Jun 07 '17 at 15:20

coding Bott

4,287
1
27
44

Even with one language there may be multiple codepages. No matter the codepage though, using `varchar` can lead to conversion errors when *client* applications store data using the wrong codepage/collation. Storage isn't an issue - the size difference is *not* that important, even in data warehouses. Table compression is available in all SQL Server editions since SQL Server 2016 SP1. – Panagiotis Kanavos Jun 07 '17 at 15:24
Try storing beer bottle sales data from Ukrainian or Polish breweries (real example). That's 3 collations/codepages at least (German, Ukrainian, Polish). Never mind that some of the data may come from DB2 or mainframes. Or try joining a table with German collation with a Latin1. You'll get a full-table scan even if you have indexes – Panagiotis Kanavos Jun 07 '17 at 15:26
i don't see a problem storing one language into a 8bit per char (varchar) having the correct collation set. I never talked about mixing languages. you are right in this case, it will end in a big mess. – coding Bott Jun 07 '17 at 15:30
The problem is that you'll have to enforce the codepage *everywhere* - in the database, clients, import jobs *and* source code (to avoid mangled literals). Even [Windows 1252](https://en.wikipedia.org/wiki/Windows-1252) itself isn't quite ANSI. "Smart" quotes and apostrophes can get mangled. – Panagiotis Kanavos Jun 07 '17 at 15:48

SQL Server - DB Impact on VARCHAR and NVARCHAR

1 Answers1