0

I've struggled on this as I'm out of my comfort zone when it comes to regular expressions. I've already created a simple regular expression preg_replce function (for WP, but this question is more PHP specific) for finding an img that has a particular class.

I now need to take that further and find a larger html string that includes the figure, figcaption and img, capture the information from those in sub patterns, than output those in a preg_replace along with additional HTML.

My existing img preg_replace looks for this img information

/<img ([^>]*)class="(.*?)size-large-16x9(.*?)" \/>\s*/iU

I then replace it with this

</div></div></div><img class="$3" <div class="tab_4 desk_b_6 desk_a_8 margin_auto single_block"><div class="padding_block content_styles">

When I've tried to expand this to look for the figure and figcaption values as well, I haven't had any luck. This is the type of figure, img and figcaption I have in my content.

<figure class="align_left inline_image medium" id="post-13569 media-13569">
<a class="fresco" href="xxx.xxx.xxx/wp-content/uploads/2014/04/bg-volunteers.jpg" data-fresco-group="single-group" data-fresco-group-options="ui:'inside'" data-fresco-caption="A caption is here about this image. It's pretty hefty.">
<img class="lazy lazy-hidden size-medium" src="data:image/gif;base64,R0lGODdhAQABAPAAAP///wAAACwAAAAAAQABAEACAkQBADs=" data-lazy-type="image" data-lazy-src="http://xxx.xxx.xxx/wp-content/uploads/2014/04/bg-volunteers-410x269.jpg" />
</a><figcaption>A caption is here about this image. It&#8217;s pretty hefty.</figcaption>

This is the intended output

</div></div></div>
<figure class="align_right inline_image medium-16x9"><a data-fresco-caption="A caption is here about this image. It's pretty hefty." data-fresco-group-options="ui:'inside'" data-fresco-group="single-group" href="http://xxx.xxxx.xx/wp-content/uploads/2014/04/bg-volunteers.jpg" class="fresco">
<img data-lazy-src="http://xxx.xxxx.xx/wp-content/uploads/2014/04/bg-volunteers-1080x607.jpg" data-lazy-type="image" src="http://xxx.xxxx.xx/wp-content/uploads/2014/04/bg-volunteers-1080x607.jpg" alt="" class="lazy size-medium-16x9 data-lazy-ready" style="display: inline;"><noscript>&lt;img class="size-medium-16x9" alt="" src="http://xxx.xxxx.xx/wp-content/uploads/2014/04/bg-volunteers-1080x607.jpg" /&gt;</noscript></a>
<figcaption>A caption is here about this image. It’s pretty hefty.</figcaption>
</figure><div class="tab_4 desk_b_6 desk_a_8 margin_auto single_block"><div class="padding_block content_styles">

Anyone with regular expression knowledge able to help?

Intend to form this into a function to allow for large images to pull out of the normal layout in long form reading articles.

I don't seem to be able to provide an answer to this question below, but I can confirm I followed the advice below and used the HTML parser approach. It was simple to implement and rather clean.

Check out http://simplehtmldom.sourceforge.net/ and http://web-developer-thing.blogspot.com.au/2010/02/php-simple-html-dom-parser-makes.html

Ashkas
  • 117
  • 1
  • 2
  • 13
  • 2
    **Don't use regular expressions to parse HTML. Use a proper HTML parsing module.** You cannot reliably parse HTML with regular expressions, and you will face sorrow and frustration down the road. As soon as the HTML changes from your expectations, your code will be broken. See http://htmlparsing.com/php or [this SO thread](http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php) for examples of how to properly parse HTML with PHP modules that have already been written, tested and debugged. – Andy Lester May 12 '14 at 00:40
  • 2
    [Don't do this with regular expressions.](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Bergi May 12 '14 at 00:40
  • Please post an example of desired output. – Pedro Lobito May 12 '14 at 00:42
  • @PedroLobito I've added an example of my desired output. – Ashkas May 13 '14 at 23:31
  • @Bergi Fair enough and I can see the value of the arguments in that post. However, what it does not provide are the alternative approaches. Do you have any information in this regard? – Ashkas May 13 '14 at 23:34
  • Worked out a solution. – Ashkas May 14 '14 at 03:31

0 Answers0