Which encoding is used for varchar
in SQL server?
Can we dynamic change encoding of varchar
?
Which encoding is used for varchar
in SQL server?
Can we dynamic change encoding of varchar
?
Late for the party, but the question remains very pertinent...
First question is "Which encoding is used for varchar in SQL server?". That's a great question!
Fast Answer: Depends on the COLLATION
(see below). For the most common COLLATIONs
, like the default ones as Latin1_General_CI_AI
or SQL_Latin1_General_CP1_CI_AS
, the ENCODING
will be Windows-1252
.
But since SQL Server 2019 we have COLLATIONs
like Latin1_General_100_CI_AS_SC_UTF8
which uses UTF-8
ENCODING
!
Complete Answer:
MSSQL Server uses the COLLATION
to determine what ENCODING
is used on char
/nchar
/varchar
/nvarchar
fields. So, differently than a lot think, COLLATION
is not only about sorting and comparing data, but also about ENCODING
, and by consequence: how our data will be stored.
So, HOW WE KNOW WHAT IS THE ENCODING USED BY OUR COLLATION? With this:
SELECT COLLATIONPROPERTY( 'Latin1_General_CI_AI' , 'CodePage' ) AS [CodePage]
--returns 1252
This simple SQL returns the Windows Code Page
for a COLLATION
. A Windows Code Page
is nothing more than another mapping to ENCODINGs
. For the Latin1_General_CI_AI
COLLATION
it returns the Windows Code Page
code 1252
, that maps to Windows-1252
ENCODING
.
So, for a varchar
column, with Latin1_General_CI_AI
COLLATION
, this field will handle its data using the Windows-1252
ENCODING
, and only correctly store characters supported by this encoding.
UTF-8
Starting with SQL Server 2019, WE CAN USE CHAR
/VARCHAR
fields and fully support UNICODE
using UTF-8
ENCODING
!!!
From Microsoft's "char and varchar (Transact-SQL)" documentation:
Starting with SQL Server 2019 (15.x), when a UTF-8 enabled collation is used, these data types store the full range of Unicode character data and use the UTF-8 character encoding. If a non-UTF-8 collation is specified, then these data types store only a subset of characters supported by the corresponding code page of that collation.
So if we use SQL Server older that 2019, like SQL Server 2008 R2 for example, we need to check the ENCODING
using the method explained before. But if we use SQL Server 2019 or newer, and define a COLLATION
like Latin1_General_100_CI_AS_SC_UTF8
, then our field will use UTF-8
ENCODING
which is by far the most used and efficient encoding that supports all the UNICODE
characters.
You can see a more extended explanation covering ENCODINGs
in the char
/nchar
/varchar
/nvarchar
fields, and other details on this answer: https://stackoverflow.com/a/63637996/3395460
Second question is "Can we dynamic change encoding of varchar?".
You can, although I don't see a good reason for that. With a ALTER TABLE, you can change a COLLATION
of the entire table, or just a field, hence, changing the ENCONDING
of it (actually you can even change the collation of entire database).
ALTER TABLE dbo.MyTable
ALTER COLUMN MyColumn VARCHAR(50) COLLATE Latin1_General_100_CI_AI_SC_UTF8
There it is. But remember, the conversion of the data on those fields can cause data loss!
Collation is defined at three different levels, each one overriding the previous.
First off you have the Server collation - that is the collation defined when you installed the instance and affects all subsequent objects unless redefined elsewhere The second is the Database collation - this overrides the server collation and effects all the objects in the DB. Last off each column in a table can have it's own collation.
Select Convert (Varchar, ServerProperty('collation'));
Select name, collation_name
From sys.databases;
Select name, collation_name
From sys.columns
Where name = N'<ColumnName>' And Object_Id = Object_Id('<Table\ViewName>')
Changing the collation of a column isn't always easy as you have to locate all references to it first This Script might help there.
Karasu CZ's right about one thing - an NVarChar resolves all these issues as NVarChar's are collation free. But that's not what you asked for!