7

e.g,

foo1
foo2
foo10
foo100

rather than

foo1
foo10
foo100
foo2

Update: not interested in coding the sort myself (although that's interesting in its own right), but having the database to do the sort for me.

Mark Harrison
  • 297,451
  • 125
  • 333
  • 465
  • [Jeff also has a post](http://www.codinghorror.com/blog/archives/001018.html) on the topic, with more resources for other languages. – Tom Ritter Oct 20 '08 at 21:29

3 Answers3

8

You can use functions in your order-by clause. In this case, you can split the non-numeric and numeric portions of the field and use them as two of the ordering criteria.

select * from t
 order by to_number(regexp_substr(a,'^[0-9]+')),
          to_number(regexp_substr(a,'[0-9]+$')),
          a;

You can also create a function-based index to support this:

create index t_ix1
    on t (to_number(regexp_substr(a, '^[0-9]+')),
          to_number(regexp_substr(a, '[0-9]+$')), 
          a);
Mark Harrison
  • 297,451
  • 125
  • 333
  • 465
  • If you want SQL answers, you should probably clarify your question to specify that. Or do you want to collect a variety of sorting techniques? – Justin Voss Oct 20 '08 at 21:23
  • 2
    He posted the question and answer at the same time. He is either looking to share this bit of knowledge he discovered, collect rep points, or both. – postfuturist Oct 20 '08 at 21:26
  • 2
    Not to be impolite, but it is tagged as SQL. – kafuchau Oct 20 '08 at 21:26
  • Be that as it may, a bit more of a description in the question wouldn't hurt – Davy8 Oct 20 '08 at 21:31
  • 1
    Doesn't $[0-9]+ match digits after the end of the line and so always be null? Also, this doesn't work if there is more than one group of digits. – Joseph Bui Aug 02 '10 at 17:36
5

For short strings, small number of numerics

If number of "numerics" and the maximum length are limited, there is a regexp-based solution.

The idea is:

  • Pad all numerics with 20 zeroes
  • Remove excessive zeroes using another regexp. This might be slow due to regexp backtracking.

Assumptions:

  • Maximum length of numerics is known beforehand (e.g. 20)
  • All the numerics can be padded (in other words, lpad('1 ', 3000, '1 ') will fail due do unable to fit padded numerics into varchar2(4000))

The following query is optimized for "short numerics" case (see *?) and it takes 0.4 seconds. However, when using such approach, you need to predefine padding length.

select * from (
  select dbms_random.string('X', 30) val from xmltable('1 to 1000')
)
order by regexp_replace(regexp_replace(val, '(\d+)', lpad('0', 20, '0')||'\1')
                      , '0*?(\d{21}(\D|$))', '\1');

"Clever" approach

Even though separate natural_sort function can be handy, there is a little-known trick to do that in pure SQL.

Key ideas:

  • Strip leading zeroes from all the numerics so 02 is ordered between 1 and 3: regexp_replace(val, '(^|\D)0+(\d+)', '\1\2'). Note: this might result in "unexpected" sorting of 10.02 > 10.1 (since 02 is converted to 2), however there is no single answer how things like 10.02.03 should be sorted
  • Convert " to "" so text with quotes works properly
  • Convert input string to comma delimited format: '"'||regexp_replace(..., '([^0-9]+)', '","\1","')||'"'
  • Convert csv to the list of items via xmltable
  • Augment numeric-like items so string sort works properly
  • Use length(length(num))||length(num)||num instead of lpad(num, 10, '0') as the latter is less compact and does not support 11+ digit numbers. Note:

Response time is something like 3-4 seconds for sorting list of 1000 random strings of length 30 (the generation of the random strings takes 0.2 sec itself). The main time consumer is xmltable that splits text into rows. If using PL/SQL instead of xmltable to split string into rows the response time reduces to 0.4sec for the same 1000 rows.

The following query performs natural sort of 100 random alpha-numeric strings (note: it produces wrong results in Oracle 11.2.0.4 and it works in 12.1.0.2):

select *
  from (
    select (select listagg(case when regexp_like(w, '^[0-9]')
                                then length(length(w))||length(w)||w else w
                           end
                   ) within group (order by ord)
              from xmltable(t.csv columns w varchar2(4000) path '.'
                                        , ord for ordinality) q
           ) order_by
         , t.*
    from (
           select '"'||regexp_replace(replace(
                                          regexp_replace(val, '(^|\D)0+(\d+)', '\1\2')
                                        , '"', '""')
                                    , '([^0-9]+)', '","\1","')||'"' csv
                , t.*
           from (
                  select dbms_random.string('X', 30) val from xmltable('1 to 100')
                ) t
         ) t
  ) t
order by order_by;

The fun part is this order by can be expressed without subqueries, so it is a handy tool to make your reviewer crazy (it works in both 11.2.0.4 and 12.1.0.2):

select *
  from (select dbms_random.string('X', 30) val from xmltable('1 to 100')) t
 order by (
   select listagg(case when regexp_like(w, '^[0-9]')
                       then length(length(w))||length(w)||w else w
                  end
          ) within group (order by ord)
     from xmltable('$X'
            passing xmlquery(('"'||regexp_replace(replace(
                                                     regexp_replace(t.val, '(^|\D)0+(\d+)', '\1\2')
                                                   , '"', '""')
                                                , '([^0-9]+)', '","\1","')||'"')
                             returning sequence
                    ) as X
            columns w varchar2(4000) path '.', ord for ordinality) q
);
Vladimir Sitnikov
  • 1,467
  • 12
  • 31
4

I use the following function to 0-pad all sequences of digits shorter than 10 that could be found in the value, so that the total length of each to become 10 digits. It is compatible even with mixed sets of values that have one, many or none sequences of digits in them.

CREATE OR replace function NATURAL_ORDER(
    P_STR   varchar2
) return varchar2
IS
/** --------------------------------------------------------------------
    Replaces all sequences of numbers shorter than 10 digits by 0-padded
    numbers that exactly 10 digits in length. Usefull for ordering-by
    using NATURAL ORDER algorithm.
 */
    l_result  varchar2( 32700 );
    l_len     integer;
    l_ix      integer;
    l_end     integer;
begin
    l_result := P_STR;
    l_len := LENGTH( l_result );
    l_ix := 1;
    while l_len > 0 loop
        l_ix := REGEXP_INSTR( l_result, '[0-9]{1,9}', l_ix, 1, 0 );
        EXIT when l_ix = 0;
        l_end := REGEXP_INSTR( l_result, '[^0-9]|$', l_ix, 1, 0 );
        if ( l_end - l_ix >= 10 ) then
            l_ix := l_end;
        else
            l_result := substr( l_result, 1, l_ix - 1 )
                     || LPAD( SUBSTR( l_result, l_ix, l_end-l_ix ), 10, '0' )
                     || substr( l_result, l_end )
                     ;
            l_ix := l_ix + 10;
        end if;
    end loop;
    return l_result;
end;
/

For example:

select 'ABC' || LVL || 'DEF' as STR
  from (
          select LEVEL as LVL
            from DUAL
           start with 1=1
           connect by LEVEL <= 35
       )
 order by NATURAL_ORDER( STR )
Coder
  • 41
  • 1
  • Clever! Thanks for sharing. Although I suspect not particularly efficient. And obviously breaks down with larger integers with more than 10 digits. Still, interesting. – ddevienne Feb 14 '20 at 10:28