28

I've got a Postgres ORDER BY issue with the following table:

em_code  name
EM001    AAA
EM999    BBB
EM1000   CCC

To insert a new record to the table,

  1. I select the last record with SELECT * FROM employees ORDER BY em_code DESC
  2. Strip alphabets from em_code usiging reg exp and store in ec_alpha
  3. Cast the remating part to integer ec_num
  4. Increment by one ec_num++
  5. Pad with sufficient zeors and prefix ec_alpha again

When em_code reaches EM1000, the above algorithm fails.

First step will return EM999 instead EM1000 and it will again generate EM1000 as new em_code, breaking the unique key constraint.

Any idea how to select EM1000?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Mithun Sreedharan
  • 49,883
  • 70
  • 181
  • 236
  • 6
    Why don't you just use sequence? – zerkms Feb 07 '12 at 09:12
  • 3
    Why not use a sequence? http://www.postgresql.org/docs/current/static/sql-createsequence.html – Mark Byers Feb 07 '12 at 09:12
  • 5
    Oh dear a classic. Rule one for computerising paper systems, intelligent numbers are dumb. put em in one column, an int in the other sort by both. Build it back into one id for the dumb humans when you need to. – Tony Hopkinson Feb 07 '12 at 09:27
  • @Erwin Brandstetter Thanks for those edits – Mithun Sreedharan Mar 24 '12 at 04:18
  • Note that the approach you describe will fail miserably if more than one transaction at a time is `INSERT`ing, unless you `LOCK TABLE` first. Regarding the sort, you might find [this question](http://stackoverflow.com/questions/12965463/humanized-or-natural-number-sorting-of-mixed-word-and-number-strings) quite informative. – Craig Ringer Jul 20 '13 at 12:23

9 Answers9

27

Since Postgres 9.6, it is possible to specify a collation which will sort columns with numbers naturally.

https://www.postgresql.org/docs/10/collation.html

-- First create a collation with numeric sorting
CREATE COLLATION numeric (provider = icu, locale = 'en@colNumeric=yes');

-- Alter table to use the collation
ALTER TABLE "employees" ALTER COLUMN "em_code" type TEXT COLLATE numeric;

Now just query as you would otherwise.

SELECT * FROM employees ORDER BY em_code

On my data, I get results in this order (note that it also sorts foreign numerals):

Value
0
0001
001
1
06
6
13
۱۳
14
Bouke Versteegh
  • 4,097
  • 1
  • 39
  • 35
18

One approach you can take is to create a naturalsort function for this. Here's an example, written by Postgres legend RhodiumToad.

create or replace function naturalsort(text)
    returns bytea language sql immutable strict as $f$
    select string_agg(convert_to(coalesce(r[2], length(length(r[1])::text) || length(r[1])::text || r[1]), 'SQL_ASCII'),'\x00')
    from regexp_matches($1, '0*([0-9]+)|([^0-9]+)', 'g') r;
$f$;

Source: http://www.rhodiumtoad.org.uk/junk/naturalsort.sql

To use it simply call the function in your order by:

SELECT * FROM employees ORDER BY naturalsort(em_code) DESC
Jonathan
  • 18,229
  • 10
  • 57
  • 56
13

The reason is that the string sorts alphabetically (instead of numerically like you would want it) and 1 sorts before 9. You could solve it like this:

SELECT * FROM employees
ORDER  BY substring(em_code, 3)::int DESC;

It would be more efficient to drop the redundant 'EM' from your em_code - if you can - and save an integer number to begin with.

Answer to question in comment

To strip any and all non-digits from a string:

SELECT regexp_replace(em_code, E'\\D','','g')
FROM   employees;

\D is the regular expression class-shorthand for "non-digits".
'g' as 4th parameter is the "globally" switch to apply the replacement to every occurrence in the string, not just the first.

After replacing every non-digit with the empty string, only digits remain.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
8

This always comes up in questions and in my own development and I finally tired of tricky ways of doing this. I finally broke down and implemented it as a PostgreSQL extension:

https://github.com/Bjond/pg_natural_sort_order

It's free to use, MIT license.

Basically it just normalizes the numerics (zero pre-pending numerics) within strings such that you can create an index column for full-speed sorting au naturel. The readme explains.

The advantage is you can have a trigger do the work and not your application code. It will be calculated at machine-speed on the PostgreSQL server and migrations adding columns become simple and fast.

sagneta
  • 1,546
  • 18
  • 26
5

you can use just this line "ORDER BY length(substring(em_code FROM '[0-9]+')), em_code"

ykhlef hamza
  • 51
  • 1
  • 1
2

I wrote about this in detail in this related question:

Humanized or natural number sorting of mixed word-and-number strings

(I'm posting this answer as a useful cross-reference only, so it's community wiki).

Community
  • 1
  • 1
Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
2

I came up with something slightly different.

The basic idea is to create an array of tuples (integer, string) and then order by these. The magic number 2147483647 is int32_max, used so that strings are sorted after numbers.

  ORDER BY ARRAY(
    SELECT ROW(
      CAST(COALESCE(NULLIF(match[1], ''), '2147483647') AS INTEGER),
      match[2]
    )
    FROM REGEXP_MATCHES(col_to_sort_by, '(\d*)|(\D*)', 'g')
    AS match
  )
Justin L.
  • 3,957
  • 3
  • 31
  • 29
0

I thought about another way of doing this that uses less db storage than padding and saves time than calculating on the fly.

https://stackoverflow.com/a/47522040/935122

I've also put it on GitHub

https://github.com/ccsalway/dbNaturalSort

Christian
  • 3,708
  • 3
  • 39
  • 60
0

The following solution is a combination of various ideas presented in another question, as well as some ideas from the classic solution:

create function natsort(s text) returns text immutable language sql as $$
  select string_agg(r[1] || E'\x01' || lpad(r[2], 20, '0'), '')
  from regexp_matches(s, '(\D*)(\d*)', 'g') r;
$$;

The design goals of this function were simplicity and pure string operations (no custom types and no arrays), so it can easily be used as a drop-in solution, and is trivial to be indexed over.

Note: If you expect numbers with more than 20 digits, you'll have to replace the hard-coded maximum length 20 in the function with a suitable larger length. Note that this will directly affect the length of the resulting strings, so don't make that value larger than needed.

vog
  • 23,517
  • 11
  • 59
  • 75