Working with HTML, I want to match all tags containing a string. For example, I want to match all hyperlinks (separate matches; one match per complete ... tag) within each of which appears the string "click here".
Example source - I want to match each of these as separate matches:
<a href="/somepage">click here</a>
<a href="/somepage">please <b>click here</b> now</a>
<a href="/somepage"><img src="/someimage" alt="click here"/></a>
So I need to start with the opening tag (eg. <a\s+[^>]+>
) then match "click here" but on condition it appears before the next closest </a>
closing tag. For example, the following are not suitable:
<a\s+[^>]+>.*?click here.*?</a>
matches any link (then all HTML) up to the first "click here".
<a\s+[^>]+>[^<]*click here.*?</a>
only matches if no other tags exist inside the <a>
.
Only idea so far:
<a\s+[^>]+>(?:.*?(?=</a>))
will match everything within a specific <a>
tag, but I don't know how to then "back-check" for text within the (?:)
group. Is that possible?