I am trying to write a regex in PHP that allows me to capture the last instance of an HTML tag right before an instance of another HTML tag.
For example, if I have the following HTML:
<p>Para #1</p><p><a href="/path/to/keyword-here/21">Link Here</a> Para #2</p><p>Para #3</p>
I want to capture just the following, with capturing groups for keyword-here
and 21
:
<p><a href="/path/to/keyword-here/21">Link Here</a> Para #2</p>
I tried using the following regex, but it ended up getting everything from <p>Para #1
to the </p>
after Para #2
, which is too much:
'#<p.*?<a .*?(keyword-here)/(\d+).*?</a>.*?</p>#'
Because that didn't work, I then tried adding a negative lookahead as follows, but that causes no matches to be returned at all:
'#<p(?!.*<p).*?<a .*?(keyword-here)/(\d+).*?</a>.*?</p>#'
So now I'm stuck. The first regex captures too much, the second is too restrictive and doesn't match anything at all. Where's the balance in the middle to get what I'm after?
What am I missing? Am I close or completely approaching this in the wrong way? Thank you.