Simplistically, you can use the COLLATIONPROPERTY function, which gives you:
- CodePage
- LCID
- ComparisonStyle
- Version
The ComparisonStyle
is a bit-masked field that is encoded as follows:
- Case insensitivity (IgnoreCase) = 1
- Accent insensitivity (IgnoreNonSpace) = 2,
- Kana type insensitivity (IgnoreKanaType) = 65536
- Width insensitivity (IgnoreWidth) = 131072
Unfortunately, everything-sensitive (e.g. Latin1_General_CS_AS_KS_WS
) equates to 0. This is unfortunate because both _BIN
and _BIN2
collations also equate to 0. Hence you still need to check the name to see if it ends in _BIN%
to get the full picture.
But, this is not so simple. There are two main types of Collations: SQL Server collations and Windows Collations.
The SQL Server collations (i.e. starting with SQL_
) are deprecated and should not be used anymore, though a lot of systems to default to SQL_Latin1_General_CP1_CI_AS
.
For both types of collations, NCHAR / NVARCHAR / XML data uses the Unicode sorting algorithms. For non-Unicode data, the Windows collations should sort the same between SQL Server and .NET. However, for the SQL Server collations, the sorting algorithm does not necessarily match to the Windows collation (or possibly anything). But they do have their own Sort Order IDs and there might be public documentation describing those rules.
The Windows collations have several variations:
differing versions: unspecified should be the original set, then the first set of updates are labeled _90
and the newest updates are the _100
series.
differing binary ordering: the older _BIN
collations do not map to anything exactly in .NET since they compared the first character as a character. The newer _BIN2
collations are pure code-point comparisons and ordering and should map to the ordinal
ComparisonStyle.
Beyond the specifics of any particular collation, there is another factor complicating what you are trying to accomplish: the default collation for a database does not necessarily determine the collation used for sorting / comparing a particular predicate or field! The collation can be taken from the field being operated on, it can be taken from the database default for string literals and variables, or it can be overridden in both cases via the COLLATE
clause. Please see the MSDN page for Collation Precedence for more details.
In the end, there is no deterministic means of getting the collation(s) used because each predicate in a WHERE clause could potentially use a different collation, and that can be different from the collation used in the ORDER BY
, and JOIN conditions (and GROUP BY, etc) can have their collations.
But to simplify a little:
- If the data is non-Unicode, check the Code Page for the specified locale / LCID. Then use that to create the same Encoding in .NET.
- If the data is Unicode and not using a
_BIN
collation then it should match the same settings in .NET. Again, the _BIN2
collation should match the ordinal
ComparisonStyle.
- If the data is non-Unicode with a SQL Server collation or Windows
_BIN
collation, then cross your fingers, rub a lucky rabbit's foot (though not so lucky for the rabbit), etc.
But wait, there's more! Seriously.
You need to also consider:
- as with any standard, it is up to the implementer to follow the spec. That doesn't always happen. So even with what should be a truly equivalent collation between SQL Server and your Java app, and even if there are no issues with Collation Precedence, there can still be differences in sorting and comparisons. For an example, check out my "update" on this answer on DBA.StackExchange: Why does MS SQL Server return a result for empty string check when Unicode string is not empty
- If you are transferring data between .NET and Java, keep in mind that Java is UTF-16 Big Endian while .NET is UTF-16 Little Endian.