2

I have a table column numbers containing strings like:

1, 2, 2A, 14, 14A, 20

Listed in the desired ascending sort order.

How can I formulate an ORDER BY clause to achieve this order?

Per default, postgres has to resort to alphabetical order which would be:

1, 2, 14, 20, 2A, 14A

Can this be done using only the string-manipulation features that come with Postgres? (replace(), regex_replace() etc?)

My first idea was:

  1. cut the letter, if present
  2. number * 100
  3. add ascii of letter, if present

This would yield the desired result as the mapped values would be:

100, 200, 265, 1400, 1465, 2000

I could also index this manipulated value to speed up sorting.

Additional restrictions:

I cannot use casts to hex numbers, because eg.: 14Z is valid too.

Ideally, the result is a single expression. I'd need to use this transformation for filtering and sorting like:

SELECT * FROM table WHERE transform(numbers) < 15 ORDER BY transform(numbers)
RESULT:

 1, 2, 2A, 14, 14A

I tried to implement my idea, using what I learned from @klin's answer:

  1. Cut the letter and multiply number by 100:

    substring('12A' from '(\d+).*')::int*100
    
  2. Cut the numbers and get ASCII of letter:

    ascii(substring('12A' from '\d+([A-Z])'))
    
  3. Add the two.

This works fine with 12A, but does not work with 12, as the second expression returns NULL and not 0 (numeric zero). Any ideas?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
billdoor
  • 1,999
  • 4
  • 28
  • 54

2 Answers2

3

Based on these assumptions:

  • Numbers consist of digits and optionally one pending letter and nothing else.
  • There is always at least one leading digit.
  • All letters are either upper case [A-Z] or lower case [a-z], but not mixed.

I would enforce that with a CHECK constraint on the table column to be absolutely reliable.

Create a tiny IMMUTABLE SQL function:

CREATE OR REPLACE FUNCTION f_nr2sort(text)
  RETURNS int AS
$func$
   SELECT CASE WHEN right($1, 1) > '9' COLLATE "C"  -- no collation
               THEN left($1, -1)::int * 100 + ascii(right($1, 1))
               ELSE $1::int * 100 END  -- only digits
$func$ LANGUAGE SQL IMMUTABLE;

Optimized for performance based on above assumptions. I replaced all regular expressions with the much cheaper left() and right().

I disabled collation rules with COLLATE "C" for the CASE expression (it's cheaper, too) to assure default byte order of ASCII letters. Letters in [a-zA-Z] sort above '9' and if that's the case for the last letter, we proceed accordingly.
This way we avoid adding NULL values and don't need to fix with COALESCE.

Then your query can be:

SELECT *
FROM   tbl
WHERE  f_nr2sort(numbers) < f_nr2sort('15C')
ORDER  BY f_nr2sort(numbers);

Since the function is IMMUTABLE, you can even create a simple functional index to support this class of queries:

CREATE INDEX tbl_foo_id ON tbl (f_nr2sort(numbers));
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • very clear and straight forward solution, thank you! I added the one i found to my question - in case someone ants to avoid using stored_procedures - it uses coalesce as you mentioned - so you even included the approach i found plus pointed out the immutable keyword for indexing. – billdoor Dec 03 '15 at 09:42
0

I am new at PostgreSQL, but I found this very useful post: Alphanumeric sorting with PostgreSQL

So what about something like this:

select val
from test
order by (substring(val, '^[0-9]+'))::int, substring(val, '[^0-9_].*$') desc

Hope it helps

Community
  • 1
  • 1
goyiki
  • 45
  • 6