Motivation: I would like to work with strings in PostgreSQL in a case insensitive manner. I am aware of the CITEXT
data type and I am also aware of functional indexes where I can use the LOWER
function.
Still, the most efficient solution seems to be using a case insensitive collation - something trivial in Sql Server. Anyway, it seems that PostgreSQL is unable to define its own custom collations, instead it derives them from the locales found in the OS, i.e. Windows in my case.
So, the question is this - is it possible to create a custom Windows locale which would treat characters in a case insensitive manner?
The farthest I could get is install a locale builder and export the en-US locale to the respective XML representation (called LDML) to see what is inside. Looking for the sort keyword returns these two lines:
<msLocale:sortName type="en-US" />
<msLocale:sortGuid type="{00000001-57EE-1E5C-00B4-D0000BB1E11E}" />
The guid can be found in the Windows Registry:
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\Sorting\Ids]
@="{00000001-57EE-1E5C-00B4-D0000BB1E11E}"
"mn-Mong"="{00000001-57EE-1E5C-00B4-D0000BB1E11E}"
(There are more string values under the key)
And this does not lead anywhere. I am no closer to a case insensitive custom locale than before.
It is possible that LDML can be used to describe a case insensitive locale, but I have no idea how to construct one.
Edit
Food for thought:
SQL Server:
SELECT 'Latin1_General_CS_AS' AS 'Collation',
COLLATIONPROPERTY('Latin1_General_CS_AS', 'CodePage') AS 'CodePage',
COLLATIONPROPERTY('Latin1_General_CS_AS', 'LCID') AS 'LCID',
CONVERT(VARBINARY(8), COLLATIONPROPERTY('Latin1_General_CS_AS', 'ComparisonStyle')) AS 'ComparisonStyle',
COLLATIONPROPERTY('Latin1_General_CS_AS', 'Version') AS 'Version'
UNION ALL
SELECT 'Latin1_General_CI_AS' AS 'Collation',
COLLATIONPROPERTY('Latin1_General_CI_AS', 'CodePage') AS 'CodePage',
COLLATIONPROPERTY('Latin1_General_CI_AS', 'LCID') AS 'LCID',
CONVERT(VARBINARY(8), COLLATIONPROPERTY('Latin1_General_CI_AS', 'ComparisonStyle')) AS 'ComparisonStyle',
COLLATIONPROPERTY('Latin1_General_CI_AS', 'Version') AS 'Version'
yields
Collation CodePage LCID ComparisonStyle Version
Latin1_General_CS_AS 1252 1033 0x00030000 0
Latin1_General_CI_AS 1252 1033 0x00030001 0
Win32 API:
CompareStringEx Win32 function:
int CompareStringEx(
_In_opt_ LPCWSTR lpLocaleName,
_In_ DWORD dwCmpFlags,
_In_ LPCWSTR lpString1,
_In_ int cchCount1,
_In_ LPCWSTR lpString2,
_In_ int cchCount2,
_In_opt_ LPNLSVERSIONINFO lpVersionInformation,
_In_opt_ LPVOID lpReserved,
_In_opt_ LPARAM lParam
);
The flags for the dwCmpFlags
parameter can be found in C:\Program Files (x86)\Microsoft SDKs\Windows\v7.1A\Include\WinNls.h
:
//
// String Flags.
//
#define NORM_IGNORECASE 0x00000001 // ignore case
#define NORM_IGNORENONSPACE 0x00000002 // ignore nonspacing chars
#define NORM_IGNORESYMBOLS 0x00000004 // ignore symbols
#define LINGUISTIC_IGNORECASE 0x00000010 // linguistically appropriate 'ignore case'
#define LINGUISTIC_IGNOREDIACRITIC 0x00000020 // linguistically appropriate 'ignore nonspace'
#define NORM_IGNOREKANATYPE 0x00010000 // ignore kanatype
#define NORM_IGNOREWIDTH 0x00020000 // ignore width
#define NORM_LINGUISTIC_CASING 0x08000000 // use linguistic rules for casing
From which I conclude that:
- using
Latin1_General_CS_AS
results inCompareStringEx
being invoked with the flagsNORM_IGNOREKANATYPE|NORM_IGNOREWIDTH
- using
Latin1_General_CI_AS
results inCompareStringEx
being invoked with the flagsNORM_IGNOREKANATYPE|NORM_IGNOREWIDTH|NORM_IGNORECASE
But what is next? How can I create my own Windows locale similar to Latin1_General_CI_AS
, but usable outside the SQL Server?