2

My regex:

^\+?(-?)0*([[:digit:]]+,[[:digit:]]+?)0*$

It is removing leading + and leading and tailing 0s in decimal number.

I have tested it in regex101

For input: +000099,8420000 and substitution \1\2 it returns 99,842

I want the same result in Oracle database 11g:

select REGEXP_REPLACE('+000099,8420000','^\+?(-?)0*([[:digit:]]+,[[:digit:]]+?)0*$','\1\2') from dual;

But it returns 99,8420000 (tailing 0s are still present...)

What I'm missing?

EDIT

It works like greedy quantifier * at the end of regex, not lazy *? but I definitely set lazy one.

countryroadscat
  • 1,660
  • 4
  • 26
  • 49

1 Answers1

1

The problem is well-known for all those who worked with Henry Spencer's regex library implementations: lazy quantifiers should not be mixed up with greedy quantifiers in one and the same branch since that leads to undefined behavior. The TRE regex engine used in R shows the same behavior. While you may mix the lazy and greedy quantifiers to some extent, you must always make sure you get a consistent result.

The solution is to only use lazy quantifiers inside the capturing group:

select REGEXP_REPLACE('+000099,8420000', '^\+?(-?)0*([0-9]+?,[0-9]+?)0*$','\1\2') as Result from dual

See the online demo

The [0-9]+?,[0-9]+? part matches 1 or more digits but as few times as possible followed with a comma and then 1 or more digits, as few as possible.

Some more tests (select REGEXP_REPLACE('+00009,010020','[0-9]+,[0-9]+?([1-9])','\1') from dual yields +20) prove that the first quantifier in a group sets the quantifier greediness type. In the case above, Group 0 quantifier greediness is set to greedy by the first ? quantifier, and Group 1 (i.e. ([0-9]+?,[0-9]+?)) greediness type is set with the first +? (which is lazy).

enter image description here

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Almost. It is not working when after `,` we have only 0s like: `+000099,0000000` – countryroadscat Jul 25 '17 at 12:46
  • @ilovkatie I think yours doesn't either. Your dwelled upon different issues in the question. What is the expected result for `+000099,0000000`? What are other test cases? – Wiktor Stribiżew Jul 25 '17 at 12:50
  • https://regex101.com/r/erFPtk/3 it is working. Output should be `99,0`. But as your answer points, it will not work for oracle. – countryroadscat Jul 25 '17 at 12:55
  • Ok, the lazy quantifiers can be used inside Group 1, and that will work - `'^\+?(-?)0*([0-9]+?,[0-9]+?)0*$'` - http://rextester.com/SQZP50172, see the question update. I mentioned that the behavior is inconsistent, but not making it impossible to mix up the quantifier types. – Wiktor Stribiżew Jul 25 '17 at 12:59
  • Check my updated clarification. I think it should all be clear now. – Wiktor Stribiżew Jul 25 '17 at 13:12