1

Working with HTML, I want to match all tags containing a string. For example, I want to match all hyperlinks (separate matches; one match per complete ... tag) within each of which appears the string "click here".

Example source - I want to match each of these as separate matches:

<a href="/somepage">click here</a>
<a href="/somepage">please <b>click here</b> now</a>
<a href="/somepage"><img src="/someimage" alt="click here"/></a>

So I need to start with the opening tag (eg. <a\s+[^>]+>) then match "click here" but on condition it appears before the next closest </a> closing tag. For example, the following are not suitable:

<a\s+[^>]+>.*?click here.*?</a> matches any link (then all HTML) up to the first "click here". <a\s+[^>]+>[^<]*click here.*?</a> only matches if no other tags exist inside the <a>.

Only idea so far:

<a\s+[^>]+>(?:.*?(?=</a>)) will match everything within a specific <a> tag, but I don't know how to then "back-check" for text within the (?:) group. Is that possible?

Cœur
  • 37,241
  • 25
  • 195
  • 267
ingredient_15939
  • 3,022
  • 7
  • 35
  • 55

2 Answers2

2

I understand you want to match tag containing text "click here" and maybe another tags inside. Also you need to avoid situation when this is matched:

<a href="#">Hi there</a> <a href="#">Hi, <b>click here</b></a>

but rather match only second

<a href="#">Hi, <b>click here</b></a>

what you need is make sure, there is no ending of a tag between it's starting and "click here" text. This should work:

<a\s+[^>]+>((?!</a).)*click here.*</a>
Ján Stibila
  • 619
  • 3
  • 12
0
<a [^>]*>(?:(?!<\/a>).)*?\bclick here\b(?:(?!<\/a>).)*<\/a>

Try this.See demo.

https://regex101.com/r/sH8aR8/39

vks
  • 67,027
  • 10
  • 91
  • 124