I need to select data from table and sort them with ORDER BY clause. The problem is the column contains text data with czech diacritic. I cannot use COLLATE, because the DB is part of postgres cluster which was created with lc_collate = en_US.UTF-8 and I cannot afford downtime caused by recreating the cluster with correct lc_collate.
Sample data:
CREATE TABLE test (
id serial PRIMARY key,
name text
);
INSERT INTO test (name) VALUES ('Žoo'), ('Zoo'), ('ŽOO'), ('ZOO'),
('ŽoA'), ('ŽóA'), ('ŽoÁ'), ('ŽóÁ');
Ideal output:
SELECT * FROM test ORDER BY name COLLATE "cs_CZ.utf8";
id | name
----+------
2 | Zoo
4 | ZOO
5 | ŽoA
7 | ŽoÁ
6 | ŽóA
8 | ŽóÁ
1 | Žoo
3 | ŽOO
(8 rows)
Here I found kind of solution:
SELECT * FROM test ORDER BY name USING ~<~;
id | name
----+------
4 | ZOO
2 | Zoo
3 | ŽOO
5 | ŽoA
1 | Žoo
7 | ŽoÁ
6 | ŽóA
8 | ŽóÁ
(8 rows)
The result is close enough (for my usage) - the caroned letters are AFTER the non-caroned.
My slightly off-topic Postgresql anabasis with ~<~
operator
edit: turned into new question.
Back to the question: Is there other solution to get the ideal order besides recreating the postgres cluster with correct locale?
Also some insght on the ~<~
operator would be nice.