1

Unfortunately \b regular expression character doesn't work in Oracle.

As a workaround I found following expression:

(^|\s|\W)(100100|100101|100102|100103)($|\s|\W)

(see: The missing \b regular expression special character in Oracle.), but in the test string data:

Test string 100100/100101, ABC-DEF,  100102 100103 test data abc100100 100100abc.

values 100101 and 100103 are not matched, while I am expecting them to be matched like it is the case of \b expression.

Is there any way to make it working? I am using Oracle 11g.

I would be appreciated for any help.

EDIT:

My goal is to tag all matches. The output that I am expecting is:

Test string [ddd]100100[/ddd]/[ddd]100101[/ddd], ABC-DEF,  [ddd]100102[/ddd] [ddd]100103[/ddd] test data abc100100 100100abc.

In this purpose I am using following statement:

regexp_replace(p_text,'(^|\s|\W)(' || l_ids || ')($|\s|\W)', '\1[ddd]\2[/ddd]\3');

Where:

  • l_ids - list of ids separated by |, id can contain number, letters, underscores and dashes
  • p_text - input text

EDIT 2:

In the above test string value 100100 should not be matched in the word abc100100 as well as 100100abc.

kpater87
  • 1,190
  • 12
  • 31
  • do you want to output matches on different rows? – Vamsi Prabhala May 03 '17 at 13:37
  • Second `(^|\s|\W)` should be `($|\s|\W)`. Furthermore `\s` is already included in `\W`. Note that the `/` following the first number gets consumed by the first pattern and thus can't be used for the second match, that's hard to work around without lookaheads. – Sebastian Proske May 03 '17 at 13:39
  • @vkp please see my edit in the question. – kpater87 May 03 '17 at 13:44
  • @Sebastian Proske regarding `($|\s|\W)` of course you are right. It was a copy/paste mistake. But still `100101` and `100103` values are not matched. Regarding `lookaheads` - yes I know, but as far as I know it also doesn't work in Oracle 11g... Therefore I am looking for another workaround... – kpater87 May 03 '17 at 13:47
  • Migrate to PostgreSQL and use [PL/Perl](https://en.wikipedia.org/wiki/PL/Perl) ;-) But it also possible to write Java stored procedures, which will give you more advanced regular expression handling. – ceving May 03 '17 at 13:47
  • @ceving it's better than nothing, but I am worried it won't be possible in my case :P – kpater87 May 03 '17 at 13:48
  • @kpater87 A simple example for UUID: http://stackoverflow.com/a/13966531/402322 – ceving May 03 '17 at 13:50

1 Answers1

0

Assuming -

  • chr(1) does not appear in the text
  • Any character that is not in [a-zA-Z0-9] is considered as a delimiter (e.g. /)

with t (p_text) as (select 'Test string 100100/100101, ABC-DEF,  100102 100103 test data abc100100 100100abc.' from dual)

select  replace
        (
            regexp_replace
            (
                regexp_replace
                (
                    p_text
                   ,'([a-zA-Z0-9]+)'
                   ,chr(1) || '\1' || chr(1)
                )
               ,chr(1) || '(100100|100101|100102|100103)' || chr(1)
               ,'[ddd]\1[/ddd]'
            )
           ,chr(1)
        ) 

from    t

Test string [ddd]100100[/ddd]/[ddd]100101[/ddd], ABC-DEF, [ddd]100102[/ddd] [ddd]100103[/ddd] test data abc100100 100100abc.

David דודו Markovitz
  • 42,900
  • 6
  • 64
  • 88
  • Yes, It works! But there is one issue here. It won't work for long input strings. My tests show that first `REGEXP_REPLACE` cuts the output to 4000 bytes (this I understand). But second `REGEXP_REPLACE` cuts the output to about 1800 bytes and this is sth that I don't understand.... But using PL/SQL I can workaround this issue. Thank you. – kpater87 May 04 '17 at 08:38