0

I have just found out something very curious in PostgreSQL. Suppose we have a column "name" with values like

bbb
(ccc)
aaa
(eee)
ddd


If I do a SELECT (...) ORDER BY name ASC, the result is

aaa
bbb
(ccc)
ddd
(eee)


That was very clever, but what I had previously in MySQL (and what I really want now) is

(ccc)
(eee)
aaa
bbb
ddd



Any idea why that happens and how can I fix it?


Edit: as mentioned in the comments, it looks like the guilty lies on the en_US.UTF-8 collation. But I need it to correctly sort strings with latin characters. So is there any regex or an alternative collation to solve the problem?

Jan K. S.
  • 1,761
  • 1
  • 14
  • 13
  • It seems to be related to the environment. On http://sqlfiddle.com/#!15/a0876/3 this works the same way as you see it. If I do that in psql on my computer I get the `(ccc)` as the first row. Maybe some collation setting? –  Feb 10 '14 at 19:39
  • Yeah, I think it is related to collation or something. My environment uses C and doesn't have this problem; the production server uses en_US.UTF-8 and does have the problem. But I need some good collation to properly handle latin characters, so what do I do? – Jan K. S. Feb 10 '14 at 19:47
  • If you want to be independent of the collation it's probably better to order by a "clean" version of the column. Something like `order by regexp_replace(the_column, '[^a-zA-Z]', '')` –  Feb 10 '14 at 19:49
  • Many parts of the website use the regular sort (with en_US.UTF-8 collation), and only one have that parenthesis problem. So it would be nice to find a solution (perhaps with a regex) to address this case in particular. – Jan K. S. Feb 10 '14 at 19:53
  • The solution I showed you *does* use a regex. –  Feb 10 '14 at 20:10
  • Indeed, but it needs some adaptation. I will try something similar to replace the parenthesis for some character that comes before "a". I will post the answer here if it works. Thanks! – Jan K. S. Feb 10 '14 at 20:52
  • Yes, that is the same question. One of its solutions works for me. Thanks! – Jan K. S. Feb 12 '14 at 13:20

1 Answers1

2

If you are prefer collation C, why not explicitly use it in the order statement?

SELECT name FROM t ORDER BY (name COLLATE "C");

Works on my system.

Or if in specific, you'd like to ignore the difference between the same letter with different accents (like collation en-US etc), but NOT ignore brackets (like collation C), you can strip accents using unaccent() and then keep using C. (Even though this might give you random orders within the same letter.)

SELECT name FROM t ORDER BY (unaccent(name) COLLATE "C");

See how to get unaccent() working if (not already) here: Error when creating unaccent extension on PostgreSQL

Another solution, if your problem is exactly with an opening bracket on the first character, is the following:

SELECT name FROM t ORDER BY (NOT substring(name from 1 for 1)='(', name);

If this is not sufficient, I suggest to rethink why not ignoring brackets is so important for you and if you could include that information on another column etc.

Community
  • 1
  • 1
Daniel Sparing
  • 2,163
  • 15
  • 20
  • 1
    If that really is the solution, then changing the column to always use that collation is probably the easiest solution (as all existing queries will automatically apply that collation) –  Feb 10 '14 at 21:25
  • 1
    True, and indeed the fact that the dev and prod environments use different collations shows that something is wrong with propagating schema changes from dev to prod. – Daniel Sparing Feb 10 '14 at 21:29
  • Well, that solution will give me problems with latin accented characters. – Jan K. S. Feb 12 '14 at 13:16
  • 1
    You wrote earlier that "My environment uses C and doesn't have this problem" so now I am confused. Maybe you'll need to define your own collation :) Can you be more specific with what the "problems" are (example)? – Daniel Sparing Feb 12 '14 at 15:03
  • I have updated my answer stripping off accents or catching the opening bracket on the first character, how about now? -- If this is not sufficient, then can you explain in more detail why you can't ignore brackets? What do they mean, are they really only brackets? – Daniel Sparing Feb 12 '14 at 15:24