0

I am cleaning up XML files that contain HTML by using RegEx.

Some files contain multiple style-elements and I want to remove them all and content in between. For example:

(Test here on regex101...)

<STYLE>
   group 1
</STYLE>
   Random text here which shall not be removed.
<STYLE>
   group 2
</STYLE>
   Some more random text here which shall not be removed.
<STYLE>
   group 3
</STYLE>

I am using the following RegEx with /s parameter

(<STYLE>).*(<\/STYLE>)

Problem is that this RegEx will match everything between <style> (#1) and last </style> (#3).

I would like to match only group, <style>, and </style> elements. How can this be accomplished?

Sha
  • 2,185
  • 1
  • 36
  • 61
  • You have accepted a solution, but do realise that will only remove the first style and group, ie. you can't use it to set the second (in its current form) – grail May 08 '17 at 04:46
  • @grail - the accepted solution actually does the work correctly when using /sg flags. – Sha May 08 '17 at 06:46

1 Answers1

2

You can try making the regex non greedy by using the ? operator:

(&lt;STYLE&gt;).*?(&lt;\/STYLE&gt;)
                ^^^ use ? to tell the regex engine to stop at the first closing tag

Demo here:

Regex101

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360