0

I have text:

<div class="separator" style="clear: both; text-align: center;"><img src="/demodomain.com/-13ucJuEQEUw/linktoimg.png"><a href="https://12.imgdomain.com/-13ucJuEQEUw/linktoimg.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="618" data-original-width="1062" height="372" src="https://21.imgdomain.com/-13ucJuEQEUw/WsGsjY2E2bI/-13ucJuEQEUw/linktoimg.jpg" width="640"></a></div>

I use (?<=<img)(.*?)([0-9]+.imgdomain.com)(.*?)(.*?)> to mark image domain which is in tag <img>. But it doesn't work as my expect, it also marks image domain which in tag <a>.

Demo picture

Demo Regex

How can i get correct marking? Thanks!

my notmypt
  • 55
  • 6

1 Answers1

0

Your regex is too permissive especially the use of .* this matches any character instead better use [^>] which will not match > this example matches only the img inside part.

(?<=<img)([^>]*)([0-9]+.imgdomain\.com)[^>]*?>

While for very some simple cases parsing data from HTML with regex might be ok you really should be aware of the pitfalls. For example a tag with escaped > will break the regex above. If it is not an assumption you can make better use a parser. Here the link to live demo

Moti Korets
  • 3,738
  • 2
  • 26
  • 35