3

What I would like is the equivalent of using utf8_unicode_ci in MySQL. So if I have those strings (default sort order with Postgresql):

  • Barn
  • Bubble
  • Bœuf
  • beef
  • boulette
  • bémol

I wish they would be sorted like this (as with utf8_unicode_ci in MySQL):

  • Barn
  • beef
  • bémol
  • Bœuf
  • boulette
  • Bubble

This kind of sort is case insensitive, accent insensitive and ligatures are converted to multiple characters.

I know about unaccent and lower in Postgresql but I have no idea how to use them from Django.

Possible solutions with Django/Postgresql:

  • Add new column only for sorting with data normalized (lower, unaccent).
  • Add an index (like in this answer), but I'm not sure how it will work with Django?

I don't think Full Text Search or Trigram could help me here because I'm not necessarily doing searches base on text but I need to get the good sort order.

Ideally queries should be fast so using another indexed column looks like a good avenue. But I wish to find a solution that I don't need to implement for every exisiting text column in my DB, that is easy to maintain, etc. Is there a best practice to do that?

Community
  • 1
  • 1
Etienne
  • 12,440
  • 5
  • 44
  • 50

2 Answers2

3

It isn't related to Django itself, PostgreSQL's lc_collate configuration determines this. I'd suggest you to review its value:

SHOW lc_collate;

The right thing to do is fix this configuration. Don't forget to take a look on related settings too (lc_ctype, etc.).

But if you cannot create another database with the right setting, try to explicit collate on ORDER like the following test case:

CREATE TEMPORARY TABLE table1 (column1 TEXT); 

INSERT INTO table1 VALUES('Barn'),
('beef'),
('bémol'),
('Bœuf'),
('boulette'),
('Bubble');

SELECT * FROM table1 ORDER BY column1 COLLATE "en_US"; --Gives the expected order
SELECT * FROM table1 ORDER BY column1 COLLATE "C"; --Gives "wrong" order  (in your case)

It's important to remember that PostgreSQL relies on operating system locales. This test case was executed on CentOS 7. More info here and here.

Community
  • 1
  • 1
Michel Milezzi
  • 10,087
  • 3
  • 21
  • 36
  • To my knowledge, Postgresql do not have any collation that will do the kind of sort I want. If you could prove me wrong I'll be happy! – Etienne May 15 '17 at 19:46
  • @Etienne I added a test case to my answer. – Michel Milezzi May 15 '17 at 20:00
  • I tried your test case and both `SELECT` gives the same sort order. But I'm on an old OS X system and reading your message it remember me that I probably had this problem on this system a few years ago! I will test it on another system and come back. Thanks! – Etienne May 15 '17 at 20:08
  • You are right, my problem is bad system's locales on my old OS X system. It's really time to upgrade this system! Thanks again. – Etienne May 15 '17 at 20:37
  • Glad to help you @Etienne – Michel Milezzi May 15 '17 at 20:39
0

I did this way:

But you need enable in your postgresql the module 'unaccent' before this way: CREATE EXTENSION unaccent;

def get_value_ci(field):
    return Func(field, function='LOWER', template='UNACCENT(%(function)s(%(expressions)s))')

YoutModel.objects.order_by(get_value_ci('nome_your_field'))

and works, ;)

Mateus Padua
  • 183
  • 2
  • 8