Unicode Data Type in SQL

Question

I'm new to Microsoft SQL. I'm planning to store text in Microsoft SQL server and there will be special international characters. Is there a "Data Type" specific to Unicode or I'm better encoding my text with a reference to the unicode number (i.e. \u0056)

store unicode with nvarchar - http://stackoverflow.com/questions/144283/what-is-the-difference-between-varchar-and-nvarchar — codingbiz, Jun 10 '12 at 00:40

Bridge · Accepted Answer · 2019-02-22T12:38:17.673

15

Use Nvarchar/Nchar (MSDN link). There used to be an Ntext datatype as well, but it's deprecated now in favour of Nvarchar.

The columns take up twice as much space over the non-unicode counterparts (char and varchar).

Then when "manually" inserting into them, use N to indicate it's unicode text:

INSERT INTO MyTable(SomeNvarcharColumn) 
VALUES (N'français')

edited Feb 22 '19 at 12:38

answered Jun 10 '12 at 00:41

Bridge

29,818
9
60
82

score 6 · Answer 2 · answered Jun 10 '12 at 03:00

When you say special international characters, what do you mean? If special means they aren't common and just occasional, then the overhead of nvarchar might not make sense in your situation on a table with a very large number of rows or a lot of indexing.

I'm all for using Unicode where appropriate, but understanding when it is appropriate is important.

If you are mixing data with different implied code pages (Japanese and Chinese in same database) or you just want to be forward-looking for internationalization and localization, then you want the column to be Unicode and use nvarchar data type and that's perfectly fine. Unicode is not going to magically solve all sorting problems for you.

If you are know that you will always be storing mainly ASCII but some occasional foreign characters, just store your UTF-8 data or HTML encoded data in varchar. If your data is all in Japanese and code page 932 (or any other single code page), you can still store double-byte characters in varchar, they still take up two bytes. My point is, that when you are already in a DBCS collation, international characters are no longer "special". It's not just the data storage, but any indexes as well as the working set when dealing with such a column in queries and in other dataflows.

And do not make a blanket rule that all character data should be nvarchar - it's a waste for many columns which are codes or identifiers.

Any time you have a column, go through the same questions:

What is the type of data?

What is the range?

Are NULLs allowed?

What is the limit of the size?

Are there any constraints I should apply now to stop bad data getting in from the beginning?

score 2 · Answer 3 · answered Jun 10 '12 at 00:43

2

People have had success with using the following code to force Unicode at insert data manipulation.

INSERT INTO <table> (text) values (N'<text here>)

1

answered Jun 10 '12 at 00:43

GGibson

354
2
9

Kamran Amini · Answer 4 · 2018-01-29T13:45:41.513

0

Character set features of tables and string inside them are specified for the database and if your database has a Unicode collation, strings inside the tables are unicode. As well for string columns you have to use nvarchar or nchar data types to make them able to store unicode strings. But this feature works if your database has a utf8 or unicode characterset or collation. Read this link for more information. Unicode and SQL Server

edited Jan 29 '18 at 13:45

answered Jun 10 '12 at 00:37

Kamran Amini

1,062
8
14

The link is broken, do you have an updated reference? – Simon Sobisch Jan 11 '18 at 11:44
Does this help? https://learn.microsoft.com/en-us/sql/t-sql/data-types/nchar-and-nvarchar-transact-sql – Kamran Amini Jan 24 '18 at 10:25
That is what I have read. If you feel it matches your answer please edit it in. – Simon Sobisch Jan 24 '18 at 11:43

Unicode Data Type in SQL

4 Answers4

Linked