28

I am new to postrges and want to sort varchar type columns. want to explain the problem with with below example:

table name: testsorting

   order       name
    1            b
    2            B
    3            a
    4            a1
    5            a11
    6            a2
    7            a20
    8            A
    9            a19

case sensitive sorting (which is default in postgres) gives:

select name from testsorting order by name;

    A
    B
    a
    a1
    a11
    a19
    a2
    a20
    b

case in-sensitive sorting gives:

select name from testsorting order by UPPER(name);

      A
      a
      a1
      a11
      a19
      a2
      a20
      B
      b

how can i make alphanumeric case in-sensitive sorting in postgres to get below order:

          a
          A
          a1
          a2
          a11
          a19
          a20
          b
          B

I wont mind the order for capital or small letters, but the order should be "aAbB" or "AaBb" and should not be "ABab"

Please suggest if you have any solution to this in postgres.

akhi
  • 664
  • 2
  • 10
  • 23
  • I'm adding a flat on: "case sensitive sorting (which is default in postgres)", that's not exactly right. default means "means the locale settings defined for the database." (Ref: PostgreSQL documentation collation.html). Example: on Windows, it may be case-insensitive. It is in my case on Windows 10. – Fabien Haddadi Sep 06 '19 at 02:01

6 Answers6

10

My PostgreSQL sorts the way you want. The way PostgreSQL compares strings is determined by locale and collation. When you create database using createdb there is -l option to set locale. Also you can check how it is configured in your environment using psql -l:

[postgres@test]$ psql -l
List of databases
 Name    |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges
---------+----------+----------+------------+------------+-----------------------
 mn_test | postgres | UTF8     | pl_PL.UTF8 | pl_PL.UTF8 |

As you see my database uses Polish collation.

If you created database using other collation then you can use other collation in query just like:

SELECT * FROM sort_test ORDER BY name COLLATE "C";
SELECT * FROM sort_test ORDER BY name COLLATE "default";
SELECT * FROM sort_test ORDER BY name COLLATE "pl_PL";

You can list available collations by:

SELECT * FROM pg_collation;

EDITED:

Oh, I missed that 'a11' must be before 'a2'.

I don't think standard collation can solve alphanumeric sorting. For such sorting you will have to split string into parts just like in Clodoaldo Neto response. Another option that is useful if you frequently have to order this way is to separate name field into two columns. You can create trigger on INSERT and UPDATE that split name into name_1 and name_2 and then:

SELECT name FROM sort_test ORDER BY name_1 COLLATE "en_EN", name_2;

(I changed collation from Polish into English, you should use your native collation to sort letters like aącć etc)

Michał Niklas
  • 53,067
  • 18
  • 70
  • 114
  • Thanks Michal. I checked psql -l but not showing me the configured locale. Using COLLATE "pl_PL" in SELECT worked and sorted the list in case in-sensitive however the problem still with the alphanumeric and "a2" is listed after "a11" and "a19". do you mean that using a proper COLLATE will solve alphanumeric sorting? – akhi Mar 15 '13 at 11:57
  • FYI, `'a2'` before `'a11'` is called _natural sorting_, and SQLite has a contrib for it: See https://sqlite.org/forum/info/65814bc11d873327 – ddevienne Feb 18 '22 at 09:08
5

If the name is always in the 1 alpha followed by n numerics format then:

select name
from testsorting
order by
    upper(left(name, 1)),
    (substring(name from 2) || '0')::integer
Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260
  • I am getting ERROR: invalid input syntax for integer: "" with this. Do i need to check collate?? If I remove ::integer at end, i got output which is not exactly sorted. a11 and a19 listed before a2, which is wrong(A, a , a1,a11,a19,a2, a20, B, b). The correct sorted order should be A, a , a1,a2,a11,a19,a20, B, b. – akhi Mar 18 '13 at 11:37
  • Thanks Clodoaldo. I got this error because of null value, so need additional overhead to handle null value in the approach you suggested. any comments? – akhi Mar 19 '13 at 04:18
  • @Akhilesh A null value does not produce that error message. Check select (substring(null from 2) || '0')::integer. What is the exact error message? – Clodoaldo Neto Mar 19 '13 at 16:14
  • ah atlast found what i wanted. – Abdul Baig Sep 04 '14 at 07:13
2

PostgreSQL uses the C library locale facilities for sorting strings. C library is provided by the host operating system. On Mac OS X or a BSD-family operating system,the UTF-8 locale definitions are broken and hence the results are as per collation "C".

image attached for collation results with ubuntu 15.04 as host OS

Check FAQ's on postgres wiki for more details : https://wiki.postgresql.org/wiki/FAQ

Manu Singhal
  • 309
  • 1
  • 8
2

As far as I'm concerned, I have used the PostgreSQL module citext and used the data type CITEXT instead of TEXT. It makes both sort and search on these columns case insensitive.

The module can be installed with the SQL command CREATE EXTENSION IF NOT EXISTS citext;

Alexis.Rolland
  • 5,724
  • 6
  • 50
  • 77
1

I agree with Clodoaldo Neto's answer, but also don't forget to add the index

CREATE INDEX testsorting_name on testsorting(upper(left(name,1)), substring(name from 2)::integer)
Chris Aitchison
  • 4,656
  • 1
  • 27
  • 43
  • This is the index, not the order by. It won't sort your list, Clodoaldo Neto's answer will. This will make the sort efficient. – Chris Aitchison Mar 18 '13 at 09:22
  • This index probably won't be used for sorting. Only a unique index is used for sorting as far as I can see it here with 9.3. I'd happily be proven wrong, though... – Risadinha Oct 31 '14 at 16:16
  • Any B-tree index can be utilised for a sort, regardless of if it is unique: http://www.postgresql.org/docs/9.3/static/indexes-ordering.html. Obviously depends on an index existing that reflects the specific query being made. – Chris Aitchison Nov 20 '14 at 00:15
0

Answer strongly inspired from this one.
By using a function it will be easier to keep it clean if you need it over different queries.

CREATE OR REPLACE FUNCTION alphanum(str anyelement)
   RETURNS anyelement AS $$
BEGIN
   RETURN (SUBSTRING(str, '^[^0-9]*'),
      COALESCE(SUBSTRING(str, '[0-9]+')::INT, -1) + 2000000);
END;
$$ LANGUAGE plpgsql IMMUTABLE;

Then you could use it this way:

SELECT name FROM testsorting ORDER BY alphanum(name);

Test:

WITH x(name) AS (VALUES ('b'), ('B'), ('a'), ('a1'),
   ('a11'), ('a2'), ('a20'), ('A'), ('a19'))
SELECT name, alphanum(name) FROM x ORDER BY alphanum(name);

 name |  alphanum   
------+-------------
 a    | (a,1999999)
 A    | (A,1999999)
 a1   | (a,2000001)
 a2   | (a,2000002)
 a11  | (a,2000011)
 a19  | (a,2000019)
 a20  | (a,2000020)
 b    | (b,1999999)
 B    | (B,1999999)
Community
  • 1
  • 1
Le Droid
  • 4,534
  • 3
  • 37
  • 32