e.g,
foo1
foo2
foo10
foo100
rather than
foo1
foo10
foo100
foo2
Update: not interested in coding the sort myself (although that's interesting in its own right), but having the database to do the sort for me.
e.g,
foo1
foo2
foo10
foo100
rather than
foo1
foo10
foo100
foo2
Update: not interested in coding the sort myself (although that's interesting in its own right), but having the database to do the sort for me.
You can use functions in your order-by clause. In this case, you can split the non-numeric and numeric portions of the field and use them as two of the ordering criteria.
select * from t
order by to_number(regexp_substr(a,'^[0-9]+')),
to_number(regexp_substr(a,'[0-9]+$')),
a;
You can also create a function-based index to support this:
create index t_ix1
on t (to_number(regexp_substr(a, '^[0-9]+')),
to_number(regexp_substr(a, '[0-9]+$')),
a);
If number of "numerics" and the maximum length are limited, there is a regexp-based solution.
The idea is:
Assumptions:
lpad('1 ', 3000, '1 ')
will fail due do unable to fit padded numerics into varchar2(4000)
)The following query is optimized for "short numerics" case (see *?
) and it takes 0.4 seconds. However, when using such approach, you need to predefine padding length.
select * from (
select dbms_random.string('X', 30) val from xmltable('1 to 1000')
)
order by regexp_replace(regexp_replace(val, '(\d+)', lpad('0', 20, '0')||'\1')
, '0*?(\d{21}(\D|$))', '\1');
Even though separate natural_sort
function can be handy, there is a little-known trick to do that in pure SQL.
Key ideas:
02
is ordered between 1
and 3
: regexp_replace(val, '(^|\D)0+(\d+)', '\1\2')
. Note: this might result in "unexpected" sorting of 10.02
> 10.1
(since 02
is converted to 2
), however there is no single answer how things like 10.02.03
should be sorted"
to ""
so text with quotes works properly'"'||regexp_replace(..., '([^0-9]+)', '","\1","')||'"'
xmltable
length(length(num))||length(num)||num
instead of lpad(num, 10, '0')
as the latter is less compact and does not support 11+ digit numbers.
Note: Response time is something like 3-4 seconds for sorting list of 1000 random strings of length 30 (the generation of the random strings takes 0.2 sec itself).
The main time consumer is xmltable
that splits text into rows.
If using PL/SQL instead of xmltable
to split string into rows the response time reduces to 0.4sec for the same 1000 rows.
The following query performs natural sort of 100 random alpha-numeric strings (note: it produces wrong results in Oracle 11.2.0.4 and it works in 12.1.0.2):
select *
from (
select (select listagg(case when regexp_like(w, '^[0-9]')
then length(length(w))||length(w)||w else w
end
) within group (order by ord)
from xmltable(t.csv columns w varchar2(4000) path '.'
, ord for ordinality) q
) order_by
, t.*
from (
select '"'||regexp_replace(replace(
regexp_replace(val, '(^|\D)0+(\d+)', '\1\2')
, '"', '""')
, '([^0-9]+)', '","\1","')||'"' csv
, t.*
from (
select dbms_random.string('X', 30) val from xmltable('1 to 100')
) t
) t
) t
order by order_by;
The fun part is this order by
can be expressed without subqueries, so it is a handy tool to make your reviewer crazy (it works in both 11.2.0.4 and 12.1.0.2):
select *
from (select dbms_random.string('X', 30) val from xmltable('1 to 100')) t
order by (
select listagg(case when regexp_like(w, '^[0-9]')
then length(length(w))||length(w)||w else w
end
) within group (order by ord)
from xmltable('$X'
passing xmlquery(('"'||regexp_replace(replace(
regexp_replace(t.val, '(^|\D)0+(\d+)', '\1\2')
, '"', '""')
, '([^0-9]+)', '","\1","')||'"')
returning sequence
) as X
columns w varchar2(4000) path '.', ord for ordinality) q
);
I use the following function to 0-pad all sequences of digits shorter than 10 that could be found in the value, so that the total length of each to become 10 digits. It is compatible even with mixed sets of values that have one, many or none sequences of digits in them.
CREATE OR replace function NATURAL_ORDER(
P_STR varchar2
) return varchar2
IS
/** --------------------------------------------------------------------
Replaces all sequences of numbers shorter than 10 digits by 0-padded
numbers that exactly 10 digits in length. Usefull for ordering-by
using NATURAL ORDER algorithm.
*/
l_result varchar2( 32700 );
l_len integer;
l_ix integer;
l_end integer;
begin
l_result := P_STR;
l_len := LENGTH( l_result );
l_ix := 1;
while l_len > 0 loop
l_ix := REGEXP_INSTR( l_result, '[0-9]{1,9}', l_ix, 1, 0 );
EXIT when l_ix = 0;
l_end := REGEXP_INSTR( l_result, '[^0-9]|$', l_ix, 1, 0 );
if ( l_end - l_ix >= 10 ) then
l_ix := l_end;
else
l_result := substr( l_result, 1, l_ix - 1 )
|| LPAD( SUBSTR( l_result, l_ix, l_end-l_ix ), 10, '0' )
|| substr( l_result, l_end )
;
l_ix := l_ix + 10;
end if;
end loop;
return l_result;
end;
/
For example:
select 'ABC' || LVL || 'DEF' as STR
from (
select LEVEL as LVL
from DUAL
start with 1=1
connect by LEVEL <= 35
)
order by NATURAL_ORDER( STR )