I want to make a regex that will match links in HTML code. This is example that will explain it better. Something like this:
<a href="I NEED THIS1"> <img src="I NEED THIS2"> </a> <a href="I DONT
NEED THIS" title="something"> </a> <a href="I NEED THIS3" title="blah">
<figure> <img src="I NEED THIS4" alt=""> </figure> </a>
I tried something like this, but it matches I DONT NEED THIS instead of I NEED THIS3.
<a href="([^"]*)"\s*.*?<img src="(.*?)".*?\s*<\/a>
I tried to add negative lookahead with , but no matter where I put it, it is like I didn't add it at all. I am not sure I understand negative lookahead correct, but I tried to add (?!</a>).
I used regex that finds words near each other, and it works, but it is really not very elegant solution :) It finds href and img src when distance between is 0 and 7 words:
<a href="([^"]*)"\W+(?:\w+\W+){0,7}?<img src="(.*?)".*?\s*<\/a>
It will be used in Excel VBA and I was testing it on online regex tester websites.
Any suggestion would be helpful.