2

got something here I can't explain, hoping to get some help seeing what I'm obviously missing :)

While working on a solution for this question: Oracle SQL to Sort Version Numbers

I figured a clever regexp_replace + LPAD would results in a nicer sorted value. However, for some reason the LPAD kept misbehaving. Here's the "issue" in a simplified test:

  with w_data as (
     select '9'  v  from dual union all
     select '18' v  from dual
     )
  select v,
         lpad(v, 4, '0' ) a,
         regexp_replace(v, '([0-9]*)', lpad('\1', 4, '0')) b
  from w_data
  /

  V  
  -- 
  A
  ----
  B
  ----------
  9  
  0009
  00900

  18 
  0018
  001800


  2 rows selected.

so as you can see, column "a" behaves as expected .. a length 4 string, with 0's padded on left ..

However, once it goes through the regexp_replace ... it starts to get weird ... why does it behave like that ? How do I use it in conjunction with regexp_replace "properly" ? (note that my regular expression and string - as per the linked question - is a bit more complex ;) )

[edit] tried with "[0-9]+" .. still not padding properly though ..

  with w_data as (
     select '9'  v  from dual union all
     select '18' v  from dual
     )
  select v,
         lpad(v, 4, '0' ) a,
         regexp_replace(v, '([0-9]+)', lpad('\1', 4, '0')) b
  from w_data
  /

  V  
  -- 
  A
  ----
  B
  ----------
  9  
  0009
  009

  18 
  0018
  0018


  2 rows selected.

Note that the 18 comes out correctly ("0018") , however, the 9 comes out as "009" only 3 characters? should be four: "0009" ...

Community
  • 1
  • 1
Ditto
  • 3,256
  • 1
  • 14
  • 28
  • can you format better the sample, I dont understand what you want. Even when is OK refer to another question, you should always post all relevant code here. You never know if someone delete the other question – Juan Carlos Oropeza Nov 30 '15 at 14:41
  • Note that `[0-9]*` can match an empty location before each non-matching sequence/symbol, I think you need `[0-9]+`. – Wiktor Stribiżew Nov 30 '15 at 14:41
  • sorry .. meant column "a" behaves normally ... expecting a 4 string character result ... – Ditto Nov 30 '15 at 14:41
  • @stribizhev: + is better .. but still not quite .. let me update – Ditto Nov 30 '15 at 14:42
  • It is impossible to pass a backreference in the replacement pattern of regexp_replace. You need to find another way to do that. – Wiktor Stribiżew Nov 30 '15 at 14:57
  • so I can't use LPAD there ? I have to "finish" regexp first? then send the results of that back to Lpad ? (for example?) :( bummer – Ditto Nov 30 '15 at 14:58

2 Answers2

6

If you are sure that all numbers would be no more than 4 digits, you can use this:

with w_data as (
   select '9'  v  from dual union all
   select '18' v  from dual
   )
select v,
       REGEXP_REPLACE(
         REGEXP_REPLACE(v, '([0-9]+)', '000\1'),
         '(0*)([0-9]{4})', 
         '\2') b
from w_data

If not, here is more complex solution:

with w_data as (
   select '9'  v  from dual union all
   select '18'  v  from dual union all
   select '123456' v  from dual
   )
select v,
       REGEXP_REPLACE(
         REGEXP_REPLACE(v, '([0-9]+)', '000\1'),
         '(0*)([1-9]?[0-9]{4})', 
         '\2') b
from w_data
Zorg
  • 61
  • 1
  • 2
3

LPAD doesn't understand the special syntax of regular expression substitution; that only has meaning when directly passed to a REGEXP function. So this expression:

lpad('\1', 4, '0')

is simply going to return the string '00\1'. That string is then passed to REGEXP_REPLACE, which interprets the '\1' in the regular expression context.

Dave Costa
  • 47,262
  • 8
  • 56
  • 72
  • ahhh ... ok, that makes sense ... O.o so best "workaround" might be: `lpad(regexp_replace(v, '([0-9]+)', '\1'),4,'0')` ?? – Ditto Nov 30 '15 at 15:04
  • That REGEXP_REPLACE call is a no-op, I think. Each match of the regular expression is replaced with itself. Are you trying to remove non-numeric characters from the input string? – Dave Costa Nov 30 '15 at 16:43
  • ultimately (as per the link to that other question), I'm trying to sort a string version of 9 and 18 "better", by turning them into "0009" and "0018" .. the regex comes in because it's playing with versions (ie 1.9, 1.18, etc) ;) So I'm trying to turn 1.9 into "0001.0009" (for example) ... and then it'll sort vs "0001.0018" better. (I know even regex to get me in trouble :) so I'm always looking for tasks like this to learn/practice ) :) – Ditto Nov 30 '15 at 16:50