0

I have a column of type VARCHAR(15). I want to sort it as if it were numbers, but if it has a letter, it is last in alphabetical order.

I have the data:

1
12
2
21
ABC13
ABC23

I tried:

NULLIF(regexp_replace(column, '\D', '', 'g'), '')::int

I expected:

1
2
12
21
ABC13
ABC23

But, actually resulted:

1
2
12
ABC13
21
ABC23
Guilherme
  • 31
  • 3
  • You're looking for a "natural sort". There's several answers on this site, see if one is acceptable. – Schwern Jan 15 '23 at 00:27
  • One of decisions is fill some zerros before your number like lpad(nullif(regexp_replace(column, '\D', '', 'g'), ''), 10, '0')::int – Andrew Jan 15 '23 at 06:47

2 Answers2

0

1. If you need to sort only strings with digits and "ignore" alphanumeric strings sorting:

SELECT * 
FROM string_data
ORDER BY CASE WHEN string_value LIKE '^[0-9]+$'
         THEN CAST(string_value as integer)
         ELSE NULLIF(regexp_replace(string_value, '\D', '9', 'g'), '')::int
         END;
  • string_value LIKE '^[0-9]+$' checks that given string consists of only digits
  • regexp_replace(string_value, '\D', '9', 'g') replaces all the letters (\D) in given string with a digit 9 to arrange string with letters and digits in the end of a result tuple.
  • as all the branches of a CASE should return the same datatype, all the letters are suggested to be replaced by 9. Therefore there is no natural sort by letters if a string contains any letters.

And the result looks like:

 2
 12
 21
 abc13
 ABC13
 abc23
 ABC23
 abc32

2. If you need to sort numeric values first, and then sort alphanumeric strings:

This query manipulates with column order in this way:

SELECT * 
FROM string_data 
ORDER BY COALESCE(SUBSTRING(string_value FROM '^(\d+)$')::INTEGER, 99999999999),
         SUBSTRING(string_value FROM '[a-zA-z_-]+'),
         COALESCE(SUBSTRING(string_value FROM '(\d+)$')::INTEGER, 0),
         string_value;
  • COALESCE(SUBSTRING(string_value FROM '^(\d+)$')::INTEGER, 99999999999) firstly, we retrieve strings consisting of only digits and put them in the first place in a result tuple because these strings have temporary value 99999999999 due to COALESCE. The regex ^(\d+)$ allows to do that.
  • SUBSTRING(string_value FROM '[a-zA-z_-]+') secondly, we retrieve strings which start with letters.
  • COALESCE(SUBSTRING(string_value FROM '(\d+)$')::INTEGER, 0) then, we retrieve alphanumeric strings which end with a number such as ABC123 and put them after all the sorted numbers.

So the result looks like:

 2
 12
 21
 1a
 1ab
 1ab
 abc13
 abc23
 abc32
 ABC13
 ABC23
 3f
 2r

The second approach is described here.

Here is the sql fiddle.

Albina
  • 1,901
  • 3
  • 7
  • 19
0

The answer by Albina was helpful (just remember to use ~ not LIKE) but I wanted all values that were integers-as-strings sorted numerically at the start of the result set, and all non-integer values sorted lexicographically at the end of the result set.

You can do that by left padding:

order by
    (CASE WHEN maybe_int_string ~ '^[0-9]+$' -- match int-as-string only
     THEN LPAD(maybe_int_string, 5, '0')     -- order int-as-string numerically at the start
     ELSE LPAD(maybe_int_string, 5, '9')     -- sort non-integers, grouped at the end
     END)`

In this way, 3 becomes 00003, while 12 becomes 00012, so those sort as you'd expect (not 12 then 3), and bob becomes 99bob and gets stuck at the end, where it deserves.

Andrew E
  • 7,697
  • 3
  • 42
  • 38