0

I am trying to replace the word "group" when it's not between the pattern <a href and a> with "group1" . Below query replaces "group" inside the desired pattern. How to replace the word which is just outside the pattern?

with t as (
    select '<a href Part of the technical Network Group www.tech.com/sites/ hh a> group' as text from dual
    union all select '<a href mean www.tech.technical Network a>' as text from dual
    union all select 'www.tech.tech///technical <a href Network Group a>' as text from dual)
select regexp_replace(text,'group','group1',1,0,'i')
from t
WHERE REGEXP_LIKE(text,'<a href.*group.*a>','i') 

The expected output for the first row is (the word "group" appears inside and outside the pattern). The expectation is to just replace the one which is outside)

<a href Part of the technical Network group www.tech.com/sites/ hh a> group1
ABY
  • 393
  • 2
  • 11
  • Please show us the exact output you expect here. – Tim Biegeleisen Sep 29 '19 at 07:49
  • In general, to figure out whether `group` occurs inside or outside an anchor, or any other, HTML tag, is beyond the ability of regex alone (it is _certainly_ beyond the regex flavor running inside Oracle). So, I recommend that you export your HTML content from Oracle to a tool more suitable for this type of work, such as Java or C#. – Tim Biegeleisen Sep 29 '19 at 07:54
  • Thank you for the advice, Tim. I will check for more options. – ABY Sep 29 '19 at 08:27

1 Answers1

0

I would be remiss if I did not point everyone reading this to the definitive post on parsing HTML with regex's: RegEx match open tags except XHTML self-contained tags

Gary_W
  • 9,933
  • 1
  • 22
  • 40