I'm trying to parse some html and remove an unnecessary duplicate link. For example, I would like the following code:
<p>
Lorem ipsum amet
<a href="http://edition.cnn.com/">
Proin lacinia posuere
</a>
sit ipsum.
</p>
<p>
<a href="http://www.google.com/articles/blah">
[caption align="alignright"]
<a href="http://www.google.com/articles/blah">
<img src="http://hoohlr.dev/Picture-142-300x222.png" alt="Blah blah/Flickr " height="222" class="size-medium wp-image-4351" />
</a>
sociis magnis [/caption]
</a>
</p>
To be converted into this (removing the link before the [caption] as well as the closing tag:
<p>
Lorem ipsum amet
<a href="http://edition.cnn.com/">
Proin lacinia posuere
</a>
sit ipsum.
</p>
<p>
[caption align="alignright"]
<a href="http://www.google.com/articles/blah">
<img src="http://hoohlr.dev/Picture-142-300x222.png" alt="Blah blah/Flickr " height="222" class="size-medium wp-image-4351" />
</a>
sociis magnis [/caption]
</p>
The link removed should always be just before the [caption]. Can anyone good with regex help me do this using php preg_replace (or simpler method)?
I would be much appreciative. Thanks!
Edit: OK, I've made a pretty good attempt at what I'm looking for. http://regexr.com?31t05 and http://regexr.com?31svv Tried to post it as an answer by the site wouldn't let me... Can anyone improve upon it?