1

Which encoding is used for varchar in SQL server?

Can we dynamic change encoding of varchar?

Patrick Hofman
  • 153,850
  • 22
  • 249
  • 325
Denis
  • 89
  • 1
  • 9
  • 1
    possible duplicate of [Base64 encoding in SQL Server 2005 T-SQL](http://stackoverflow.com/questions/5082345/base64-encoding-in-sql-server-2005-t-sql) –  Sep 18 '15 at 07:43

2 Answers2

4

Late for the party, but the question remains very pertinent...


First question is "Which encoding is used for varchar in SQL server?". That's a great question!

Fast Answer: Depends on the COLLATION (see below). For the most common COLLATIONs, like the default ones as Latin1_General_CI_AI or SQL_Latin1_General_CP1_CI_AS, the ENCODING will be Windows-1252.
But since SQL Server 2019 we have COLLATIONs like Latin1_General_100_CI_AS_SC_UTF8 which uses UTF-8 ENCODING!


Complete Answer:

MSSQL Server uses the COLLATION to determine what ENCODING is used on char/nchar/varchar/nvarchar fields. So, differently than a lot think, COLLATION is not only about sorting and comparing data, but also about ENCODING, and by consequence: how our data will be stored.

So, HOW WE KNOW WHAT IS THE ENCODING USED BY OUR COLLATION? With this:

SELECT COLLATIONPROPERTY( 'Latin1_General_CI_AI' , 'CodePage' ) AS [CodePage]
--returns 1252

This simple SQL returns the Windows Code Page for a COLLATION. A Windows Code Page is nothing more than another mapping to ENCODINGs. For the Latin1_General_CI_AI COLLATION it returns the Windows Code Page code 1252 , that maps to Windows-1252 ENCODING.
So, for a varchar column, with Latin1_General_CI_AI COLLATION, this field will handle its data using the Windows-1252 ENCODING, and only correctly store characters supported by this encoding.

UTF-8

Starting with SQL Server 2019, WE CAN USE CHAR/VARCHAR fields and fully support UNICODE using UTF-8 ENCODING!!!

From Microsoft's "char and varchar (Transact-SQL)" documentation:

Starting with SQL Server 2019 (15.x), when a UTF-8 enabled collation is used, these data types store the full range of Unicode character data and use the UTF-8 character encoding. If a non-UTF-8 collation is specified, then these data types store only a subset of characters supported by the corresponding code page of that collation.

https://learn.microsoft.com/en-us/sql/t-sql/data-types/char-and-varchar-transact-sql?view=sql-server-ver15

So if we use SQL Server older that 2019, like SQL Server 2008 R2 for example, we need to check the ENCODING using the method explained before. But if we use SQL Server 2019 or newer, and define a COLLATION like Latin1_General_100_CI_AS_SC_UTF8, then our field will use UTF-8 ENCODING which is by far the most used and efficient encoding that supports all the UNICODE characters.

You can see a more extended explanation covering ENCODINGs in the char/nchar/varchar/nvarchar fields, and other details on this answer: https://stackoverflow.com/a/63637996/3395460



Second question is "Can we dynamic change encoding of varchar?".

You can, although I don't see a good reason for that. With a ALTER TABLE, you can change a COLLATION of the entire table, or just a field, hence, changing the ENCONDING of it (actually you can even change the collation of entire database).

ALTER TABLE dbo.MyTable 
ALTER COLUMN MyColumn VARCHAR(50) COLLATE Latin1_General_100_CI_AI_SC_UTF8

There it is. But remember, the conversion of the data on those fields can cause data loss!

Vitox
  • 3,852
  • 29
  • 30
0

Collation is defined at three different levels, each one overriding the previous.

First off you have the Server collation - that is the collation defined when you installed the instance and affects all subsequent objects unless redefined elsewhere The second is the Database collation - this overrides the server collation and effects all the objects in the DB. Last off each column in a table can have it's own collation.

Select  Convert (Varchar, ServerProperty('collation'));

Select  name, collation_name
From    sys.databases;

Select name, collation_name
From sys.columns
Where name = N'<ColumnName>' And Object_Id = Object_Id('<Table\ViewName>')

Changing the collation of a column isn't always easy as you have to locate all references to it first This Script might help there.

Karasu CZ's right about one thing - an NVarChar resolves all these issues as NVarChar's are collation free. But that's not what you asked for!

Rachel Ambler
  • 1,440
  • 12
  • 23