1

I need to select data from table and sort them with ORDER BY clause. The problem is the column contains text data with czech diacritic. I cannot use COLLATE, because the DB is part of postgres cluster which was created with lc_collate = en_US.UTF-8 and I cannot afford downtime caused by recreating the cluster with correct lc_collate.

Sample data:

CREATE TABLE test (
  id serial PRIMARY key,
  name text
);

INSERT INTO test (name) VALUES ('Žoo'), ('Zoo'), ('ŽOO'), ('ZOO'),
  ('ŽoA'), ('ŽóA'), ('ŽoÁ'), ('ŽóÁ');

Ideal output:

SELECT * FROM test ORDER BY name COLLATE "cs_CZ.utf8";
id | name
----+------
  2 | Zoo
  4 | ZOO
  5 | ŽoA
  7 | ŽoÁ
  6 | ŽóA
  8 | ŽóÁ
  1 | Žoo
  3 | ŽOO
(8 rows)

Here I found kind of solution:

SELECT * FROM test ORDER BY name USING ~<~;
id | name
----+------
  4 | ZOO
  2 | Zoo
  3 | ŽOO
  5 | ŽoA
  1 | Žoo
  7 | ŽoÁ
  6 | ŽóA
  8 | ŽóÁ
(8 rows)

The result is close enough (for my usage) - the caroned letters are AFTER the non-caroned.


My slightly off-topic Postgresql anabasis with ~<~ operator

edit: turned into new question.


Back to the question: Is there other solution to get the ideal order besides recreating the postgres cluster with correct locale?

Also some insght on the ~<~ operator would be nice.

Community
  • 1
  • 1
Vojtech Sokol
  • 123
  • 1
  • 8
  • What is your postgres version? – Ihor Romanchenko Feb 25 '16 at 14:17
  • BTW. You do not need downtime to use collation as in `ORDER BY name COLLATE "cs_CZ.utf8"`. You do not need downtime to add additional collation to existing postgres cluster: http://www.postgresql.org/docs/current/static/sql-createcollation.html – Ihor Romanchenko Feb 25 '16 at 14:22
  • Thank you for pointing that out. The downtime is unnecessary, all I needed was to add new locale on the server by adding line `cs_CZ.UTF-8 UTF-8` to `/etc/locale.gen` and regenerating locale with `locale-gen`. Then in Postgres: `CREATE COLLATION "cs_CZ.utf8" ( locale = 'cs_CZ.UTF-8' );` – Vojtech Sokol Mar 04 '16 at 23:29

3 Answers3

0

Im not sure if I understand the question, because look like you already found a solution. The only think I can suggest is you can add a new field czechName with the correct collate

http://www.postgresql.org/docs/current/static/sql-altertable.html

ADD [ COLUMN ] column_name data_type [ COLLATE collation ] [ column_constraint [ ... ] ]
Juan Carlos Oropeza
  • 47,252
  • 12
  • 78
  • 118
0

Similar to what Juan Carlos Oropeza suggests, you can try to change the collation of the column with:

ALTER TABLE test ALTER COLUMN "name" TYPE text COLLATE 'cs_CZ.utf8';

Reference: http://www.postgresql.org/docs/current/static/sql-altertable.html

0

As @Igor pointed out in his comment, there is no need to recreate the postgres cluster with different lc_collate and deal with the caused downtime.

Exact steps that solved the problem were:

  1. add/uncomment line cs_CZ.UTF-8 UTF-8 in /etc/locale.gen

  2. generate the new locale:

    # locale-gen

  3. define new collation in postgres:

    CREATE COLLATION "cs_CZ.utf8" ( locale = 'cs_CZ.UTF-8' );

Vojtech Sokol
  • 123
  • 1
  • 8