I'm newbie to regular expressions, trying to filter the HTML tags keeping only required (src / href / style) attribute with their values and remove unnecessary attributes. While googling I found a regular expression to keep only "src" attribute, hence my modified expression is as follows:
<([a-z][a-z0-9]*)(?:[^>]*(\s(src|href|style)=['\"][^'\"]*['\"]))?[^>]*?(\/?)>
Its working fine but the only problem is, if one tag contains more than one required attribute then it keeps only the last matched single attribute and discards the rest.
I'm trying to clean following text
<title>Hello World</title>
<div fadeout"="" style="margin:0px;" class="xyz">
<img src="abc.jpg" alt="" />
<p style="margin-bottom:10px;">
The event is celebrating its 50th anniversary Kö
<a style="margin:0px;" href="http://www.germany.travel/">exhibition grounds in Cologne</a>.
</p>
<p style="padding:0px;"></p>
<p style="color:black;">
<strong>A festival for art lovers</strong>
</p>
</div>
at https://regex101.com/#javascript using aforementioned expression with <$1$2$4>
as substitution string and getting following output:
<title>Hello World</title>
<div style="margin:0px;">
<img src="abc.jpg"/>
<p style="margin-bottom:10px;">
The event is celebrating its 50th anniversary Kö
<a href="http://www.germany.travel/">exhibition grounds in Cologne</a>.
</p>
<p style="padding:0px;"></p>
<p style="color:black;">
<strong>A festival for art lovers</strong>
</p>
</div>
Problem is "style" attribute is discarded from anchor tag.
I have tried to replicate the (\s(src|href|style)=['\"][^'\"]*['\"])
block using * operator, {3} selector and much more but in vain.
Any suggestions???