I've struggled on this as I'm out of my comfort zone when it comes to regular expressions. I've already created a simple regular expression preg_replce function (for WP, but this question is more PHP specific) for finding an img that has a particular class.
I now need to take that further and find a larger html string that includes the figure, figcaption and img, capture the information from those in sub patterns, than output those in a preg_replace along with additional HTML.
My existing img preg_replace looks for this img information
/<img ([^>]*)class="(.*?)size-large-16x9(.*?)" \/>\s*/iU
I then replace it with this
</div></div></div><img class="$3" <div class="tab_4 desk_b_6 desk_a_8 margin_auto single_block"><div class="padding_block content_styles">
When I've tried to expand this to look for the figure and figcaption values as well, I haven't had any luck. This is the type of figure, img and figcaption I have in my content.
<figure class="align_left inline_image medium" id="post-13569 media-13569">
<a class="fresco" href="xxx.xxx.xxx/wp-content/uploads/2014/04/bg-volunteers.jpg" data-fresco-group="single-group" data-fresco-group-options="ui:'inside'" data-fresco-caption="A caption is here about this image. It's pretty hefty.">
<img class="lazy lazy-hidden size-medium" src="data:image/gif;base64,R0lGODdhAQABAPAAAP///wAAACwAAAAAAQABAEACAkQBADs=" data-lazy-type="image" data-lazy-src="http://xxx.xxx.xxx/wp-content/uploads/2014/04/bg-volunteers-410x269.jpg" />
</a><figcaption>A caption is here about this image. It’s pretty hefty.</figcaption>
This is the intended output
</div></div></div>
<figure class="align_right inline_image medium-16x9"><a data-fresco-caption="A caption is here about this image. It's pretty hefty." data-fresco-group-options="ui:'inside'" data-fresco-group="single-group" href="http://xxx.xxxx.xx/wp-content/uploads/2014/04/bg-volunteers.jpg" class="fresco">
<img data-lazy-src="http://xxx.xxxx.xx/wp-content/uploads/2014/04/bg-volunteers-1080x607.jpg" data-lazy-type="image" src="http://xxx.xxxx.xx/wp-content/uploads/2014/04/bg-volunteers-1080x607.jpg" alt="" class="lazy size-medium-16x9 data-lazy-ready" style="display: inline;"><noscript><img class="size-medium-16x9" alt="" src="http://xxx.xxxx.xx/wp-content/uploads/2014/04/bg-volunteers-1080x607.jpg" /></noscript></a>
<figcaption>A caption is here about this image. It’s pretty hefty.</figcaption>
</figure><div class="tab_4 desk_b_6 desk_a_8 margin_auto single_block"><div class="padding_block content_styles">
Anyone with regular expression knowledge able to help?
Intend to form this into a function to allow for large images to pull out of the normal layout in long form reading articles.
I don't seem to be able to provide an answer to this question below, but I can confirm I followed the advice below and used the HTML parser approach. It was simple to implement and rather clean.
Check out http://simplehtmldom.sourceforge.net/ and http://web-developer-thing.blogspot.com.au/2010/02/php-simple-html-dom-parser-makes.html