"Normalize" for sorting
You could use regexp_replace()
with the pattern '[^a-zA-Z]'
in the ORDER BY
clause but that only recognizes pure ASCII letters. Better use the class shorthand '\W'
which recognizes additional non-ASCII letters in your locale like äüóèß
etc.
Or you could improvise and "normalize all characters with diacritic elements to their base form with the help of the unaccent()
function. Consider this little demo:
SELECT *
, regexp_replace(x, '[^a-zA-Z]', '', 'g')
, regexp_replace(x, '\W', '', 'g')
, regexp_replace(unaccent(x), '\W', '', 'g')
FROM (
SELECT 'XY ÖÜÄöüäĆČćč€ĞğīїıŁłŃńŇňŐőōŘřŠšŞşůŽžż‘´’„“”–—[](),;.:̈� XY'::text AS x) t
->SQLfiddle for Postgres 9.2.
->SQLfiddle for Postgres 9.1.
Regular expression code has been updated in version 9.2. I am assuming this is the reason for the improved handling in 9.2 where all letter characters in the example are matched, while 9.1 only matches some.
unaccent()
is provided by the additional module unaccent. Run:
CREATE EXTENSION unaccent;
once per database to use in (Postgres 9.1+, older versions use a different technique).
locales / collation
You must be aware that Postgres relies on the underlying operating system for locales (including collation). The sort order is governed by your chosen locale, or more specific LC_COLLATE
. More in this related answer:
String sort order (LC_COLLATE and LC_CTYPE)
There are plans to incorporate collation support into Postgres directly, but that's not available at this time.
Many locales ignore the special characters you describe for sorting character data out of the box. If you have a locale installed in your system that provides the sort order you are looking for, you can use it ad-hoc in Postgres 9.1 or later:
SELECT foo FROM bar ORDER BY foo COLLATE "xy_XY"
To see which collations are installed and available in your current Postgres installation:
SELECT * FROM pg_collation;
Unfortunately it is not possible to define your own custom collation (yet) unless you hack the source code.
The collation rules are usually governed by the rules of a language as spoken in a country. The sort order telephone books would be in, if there were still telephone books ... Your operating system provides them.
For instance, in Debian Linux you can use:
locale -a
to display all generated locales. And:
dpkg-reconfigure locales
as root user (one way of several) to generate / install more.