just mark a part of string beetwen two words using Regex

Question

I have text:

<div class="separator" style="clear: both; text-align: center;"><img src="/demodomain.com/-13ucJuEQEUw/linktoimg.png"><a href="https://12.imgdomain.com/-13ucJuEQEUw/linktoimg.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="618" data-original-width="1062" height="372" src="https://21.imgdomain.com/-13ucJuEQEUw/WsGsjY2E2bI/-13ucJuEQEUw/linktoimg.jpg" width="640"></a></div>

I use (?<=<img)(.*?)([0-9]+.imgdomain.com)(.*?)(.*?)> to mark image domain which is in tag <img>. But it doesn't work as my expect, it also marks image domain which in tag <a>.

Demo picture

Demo Regex

How can i get correct marking? Thanks!

Please, use an HTML parser instead. Read this: https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags — Jorge Campos, Apr 10 '18 at 02:56

score 0 · Accepted Answer · answered Apr 10 '18 at 03:04

0

Your regex is too permissive especially the use of .* this matches any character instead better use [^>] which will not match > this example matches only the img inside part.

(?<=<img)([^>]*)([0-9]+.imgdomain\.com)[^>]*?>

While for very some simple cases parsing data from HTML with regex might be ok you really should be aware of the pitfalls. For example a tag with escaped > will break the regex above. If it is not an assumption you can make better use a parser. Here the link to live demo

answered Apr 10 '18 at 03:04

Moti Korets

3,738
2
26
35

i like `(?<=]*?)([0-9]+.imgdomain.com)[^>]*?>` more. :) – my notmypt Apr 10 '18 at 03:42
Happy to help, make sure to accept an answer if it helped you. Have a nice day too! – Moti Korets Apr 10 '18 at 03:45

just mark a part of string beetwen two words using Regex

1 Answers1