0

I have a regexp pattern /[\w]+=[\w" :]+/ to remove the attributes like /id=""/ in the xml tags, I tried to keep this pattern as generic as possible, but this pattern removes the /href="https:/ attribute as well which I don't want to remove from the xml tags

Regex pattern /[\w]+=[\w" :]+/

Source xml string is,

<table id="this is id">
<tr id="this tr id">
<a href="https://www.w3schools.com">Visit W3Schools.com!</a>
<div id="this is div id"><span id="div Class:a">this is span 
text</span></div>
</tr>
</table>

I'm expecting this o/p,

<table >
<tr >
<a href="https://www.w3schools.com">Visit W3Schools.com!</a>
<div ><span >this is span text</span></div>
</tr>
</table>

but I'm getting this o/p,

<table >
<tr >
<a //www.w3schools.com">Visit W3Schools.com!</a>
<div ><span >this is span text</span></div>
</tr>
</table>

The above above is available in this link My RegEx pattern to remove id attribute

1 Answers1

0

TL;DR Use a negative look-ahead assertion

Details

You can start your regular expression with a negative look-ahead assertion that will exclude the pattern you don't want matched:

(?!href)\b[\w]+=[\w" :]+

And if there were two or more attributes you wanted to exclude, you would list them with an "or" in between:

(?!href|exclude_this_too)\b[\w]+=[\w" :]+

Demo, extended from yours

Please note the \b I also added: it says [\w]+ must be at the start of a word. This is important, otherwise it matches ref=... just leaving out the h, which is not what you want.

joanis
  • 10,635
  • 14
  • 30
  • 40