\b regular expression character in Oracle 11g

Question

Unfortunately \b regular expression character doesn't work in Oracle.

As a workaround I found following expression:

(^|\s|\W)(100100|100101|100102|100103)($|\s|\W)

(see: The missing \b regular expression special character in Oracle.), but in the test string data:

Test string 100100/100101, ABC-DEF,  100102 100103 test data abc100100 100100abc.

values 100101 and 100103 are not matched, while I am expecting them to be matched like it is the case of \b expression.

Is there any way to make it working? I am using Oracle 11g.

I would be appreciated for any help.

EDIT:

My goal is to tag all matches. The output that I am expecting is:

Test string [ddd]100100[/ddd]/[ddd]100101[/ddd], ABC-DEF,  [ddd]100102[/ddd] [ddd]100103[/ddd] test data abc100100 100100abc.

In this purpose I am using following statement:

regexp_replace(p_text,'(^|\s|\W)(' || l_ids || ')($|\s|\W)', '\1[ddd]\2[/ddd]\3');

Where:

l_ids - list of ids separated by |, id can contain number, letters, underscores and dashes
p_text - input text

EDIT 2:

In the above test string value 100100 should not be matched in the word abc100100 as well as 100100abc.

Second `(^|\s|\W)` should be `($|\s|\W)`. Furthermore `\s` is already included in `\W`. Note that the `/` following the first number gets consumed by the first pattern and thus can't be used for the second match, that's hard to work around without lookaheads. — Sebastian Proske, May 03 '17 at 13:39
@Sebastian Proske regarding `($|\s|\W)` of course you are right. It was a copy/paste mistake. But still `100101` and `100103` values are not matched. Regarding `lookaheads` - yes I know, but as far as I know it also doesn't work in Oracle 11g... Therefore I am looking for another workaround... — kpater87, May 03 '17 at 13:47
Migrate to PostgreSQL and use [PL/Perl](https://en.wikipedia.org/wiki/PL/Perl) ;-) But it also possible to write Java stored procedures, which will give you more advanced regular expression handling. — ceving, May 03 '17 at 13:47
@ceving it's better than nothing, but I am worried it won't be possible in my case :P — kpater87, May 03 '17 at 13:48
@kpater87 A simple example for UUID: http://stackoverflow.com/a/13966531/402322 — ceving, May 03 '17 at 13:50

David דודו Markovitz · Accepted Answer · 2017-05-03T16:27:05.783

Assuming -

chr(1) does not appear in the text
Any character that is not in [a-zA-Z0-9] is considered as a delimiter (e.g. /)

with t (p_text) as (select 'Test string 100100/100101, ABC-DEF,  100102 100103 test data abc100100 100100abc.' from dual)

select  replace
        (
            regexp_replace
            (
                regexp_replace
                (
                    p_text
                   ,'([a-zA-Z0-9]+)'
                   ,chr(1) || '\1' || chr(1)
                )
               ,chr(1) || '(100100|100101|100102|100103)' || chr(1)
               ,'[ddd]\1[/ddd]'
            )
           ,chr(1)
        ) 

from    t

Test string [ddd]100100[/ddd]/[ddd]100101[/ddd], ABC-DEF, [ddd]100102[/ddd] [ddd]100103[/ddd] test data abc100100 100100abc.

Yes, It works! But there is one issue here. It won't work for long input strings. My tests show that first `REGEXP_REPLACE` cuts the output to 4000 bytes (this I understand). But second `REGEXP_REPLACE` cuts the output to about 1800 bytes and this is sth that I don't understand.... But using PL/SQL I can workaround this issue. Thank you. — kpater87, May 04 '17 at 08:38

\b regular expression character in Oracle 11g

1 Answers1