2

Motivation: I would like to work with strings in PostgreSQL in a case insensitive manner. I am aware of the CITEXT data type and I am also aware of functional indexes where I can use the LOWER function.

Still, the most efficient solution seems to be using a case insensitive collation - something trivial in Sql Server. Anyway, it seems that PostgreSQL is unable to define its own custom collations, instead it derives them from the locales found in the OS, i.e. Windows in my case.

So, the question is this - is it possible to create a custom Windows locale which would treat characters in a case insensitive manner?

The farthest I could get is install a locale builder and export the en-US locale to the respective XML representation (called LDML) to see what is inside. Looking for the sort keyword returns these two lines:

  <msLocale:sortName type="en-US" />
  <msLocale:sortGuid type="{00000001-57EE-1E5C-00B4-D0000BB1E11E}" />

The guid can be found in the Windows Registry:

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\Sorting\Ids]
@="{00000001-57EE-1E5C-00B4-D0000BB1E11E}"
"mn-Mong"="{00000001-57EE-1E5C-00B4-D0000BB1E11E}"

(There are more string values under the key)

And this does not lead anywhere. I am no closer to a case insensitive custom locale than before.

It is possible that LDML can be used to describe a case insensitive locale, but I have no idea how to construct one.

Edit

Food for thought:

SQL Server:

SELECT 'Latin1_General_CS_AS' AS 'Collation',
    COLLATIONPROPERTY('Latin1_General_CS_AS', 'CodePage') AS 'CodePage', 
    COLLATIONPROPERTY('Latin1_General_CS_AS', 'LCID') AS 'LCID',
    CONVERT(VARBINARY(8), COLLATIONPROPERTY('Latin1_General_CS_AS', 'ComparisonStyle')) AS 'ComparisonStyle', 
    COLLATIONPROPERTY('Latin1_General_CS_AS', 'Version') AS 'Version'
UNION ALL
SELECT 'Latin1_General_CI_AS' AS 'Collation', 
    COLLATIONPROPERTY('Latin1_General_CI_AS', 'CodePage') AS 'CodePage', 
    COLLATIONPROPERTY('Latin1_General_CI_AS', 'LCID') AS 'LCID',
    CONVERT(VARBINARY(8), COLLATIONPROPERTY('Latin1_General_CI_AS', 'ComparisonStyle')) AS 'ComparisonStyle', 
    COLLATIONPROPERTY('Latin1_General_CI_AS', 'Version') AS 'Version'

yields

Collation               CodePage    LCID    ComparisonStyle Version
Latin1_General_CS_AS    1252        1033    0x00030000      0
Latin1_General_CI_AS    1252        1033    0x00030001      0

Win32 API:

CompareStringEx Win32 function:

int CompareStringEx(
  _In_opt_  LPCWSTR lpLocaleName,
  _In_      DWORD dwCmpFlags,
  _In_      LPCWSTR lpString1,
  _In_      int cchCount1,
  _In_      LPCWSTR lpString2,
  _In_      int cchCount2,
  _In_opt_  LPNLSVERSIONINFO lpVersionInformation,
  _In_opt_  LPVOID lpReserved,
  _In_opt_  LPARAM lParam
);

The flags for the dwCmpFlags parameter can be found in C:\Program Files (x86)\Microsoft SDKs\Windows\v7.1A\Include\WinNls.h:

//
//  String Flags.
//
#define NORM_IGNORECASE           0x00000001  // ignore case
#define NORM_IGNORENONSPACE       0x00000002  // ignore nonspacing chars
#define NORM_IGNORESYMBOLS        0x00000004  // ignore symbols

#define LINGUISTIC_IGNORECASE     0x00000010  // linguistically appropriate 'ignore case'
#define LINGUISTIC_IGNOREDIACRITIC 0x00000020  // linguistically appropriate 'ignore nonspace'

#define NORM_IGNOREKANATYPE       0x00010000  // ignore kanatype
#define NORM_IGNOREWIDTH          0x00020000  // ignore width
#define NORM_LINGUISTIC_CASING    0x08000000  // use linguistic rules for casing

From which I conclude that:

  • using Latin1_General_CS_AS results in CompareStringEx being invoked with the flags NORM_IGNOREKANATYPE|NORM_IGNOREWIDTH
  • using Latin1_General_CI_AS results in CompareStringEx being invoked with the flags NORM_IGNOREKANATYPE|NORM_IGNOREWIDTH|NORM_IGNORECASE

But what is next? How can I create my own Windows locale similar to Latin1_General_CI_AS, but usable outside the SQL Server?

halfer
  • 19,824
  • 17
  • 99
  • 186
mark
  • 59,016
  • 79
  • 296
  • 580
  • I'm not familiar enough with Windows to be sure, but… wouldn't it be possible to compile Postgres so it can access the built-in windows collations? http://technet.microsoft.com/en-us/library/ms188046.aspx – Denis de Bernardy Dec 30 '13 at 02:35
  • There is no need to compile anything. PostgreSQL allows to expose locales via collations. That is neither the problem nor the question. The question is how to make a case insensitive locale in the first place. – mark Dec 30 '13 at 11:39
  • seeing that both windows and tsql expose such a thing, I'd expect one to be available at the os level, hence my question... – Denis de Bernardy Dec 30 '13 at 12:58
  • Please, elaborate on the windows exposing such a thing. I know of no case insensitive windows locale. Do you have a name? As of Sql Server's collations - true there are case insensitive collations there, but knowing that does not help me. I have no idea how to leverage that knowledge to achieve my goal. – mark Dec 30 '13 at 14:54
  • Well, the link I posted further up seems to list collation or charset names that exist in both windows and sql-server. What would be surprising to me is that windows, a case insensitive platform, would create special case insensitive collations for sql-server, and not expose the same for other apps to use. Then again, I haven't even touched windows with a 10' pole in 10 years, so I'm merely guessing. But my best guess is: there *are* case insensitive collations at the OS level, and if correct then the path of least effort would be to make Postgres become aware of them -- flag, path, whatever. – Denis de Bernardy Dec 30 '13 at 17:44
  • Unfortunately, it is impossible - see the accepted answer. – mark Jan 03 '14 at 12:19
  • I understand that it is not possible under windows, but is it possible to achieve a similar effect under linux? – Arash Mar 25 '14 at 14:24
  • Alas, I need it for Windows. You may have to ask a separate question for Linux. – mark Mar 25 '14 at 18:16

2 Answers2

1

Probably nobody did it. I am thinking so it is possible - but this scenario is not tested and nobody knows if there is some side effect.

This issue is in PostgreSQL ToDo list still and Craig Ringer sent a proposal how to implement it. http://www.postgresql.org/message-id/52C0C31C.4060804@2ndquadrant.com

P.S.

Pavel Stehule
  • 42,331
  • 5
  • 91
  • 94
1

In Windows, there are no case insensitive sort rules. All the rules are able to sort ignoring case but that is handled by means of a flag not by a sort. There is no way to add a custom sort method.

Eric MSFT
  • 3,246
  • 1
  • 18
  • 28
  • 1
    I do not want a custom sort method. I am perfectly fine with the standard one and giving it the ignore case flag. But how can I create such a locale that would be usable outside the Sql Server? – mark Dec 30 '13 at 21:08
  • I probably should have left off the word method in my response. I was not meaning this as a function but rather as a general strategy. Custom sorts are not supported in Windows. Locales are not able to bind to what would be a set of rules with a particular set of flags. Those types of sorts are specific to database implementations. – Eric MSFT Jan 02 '14 at 17:10
  • This is extremely sad. – mark Jan 03 '14 at 12:19
  • @EricMSFT: out of curiosity, is that to say that Windows itself achieves case insensitive sorting by setting a flag on top of a case sensitive collation, or am I misunderstanding your reply? – Denis de Bernardy Jan 03 '14 at 12:28
  • 1
    @Denis: Windows stores weights appropriate for the various levels of collation (see the [Unicode Collation Algorithm](http://www.unicode.org/reports/tr10/) for an example of implementation of multi-level comparison). When case insensitive sorting is requested, we ignore the weight for the casing level. – Eric MSFT Jan 06 '14 at 21:09