0

I've this HTML string (validated):

<div><img src="images/stories/2014/AAA.gif" alt="AAA" width="24" height="24" /> THE PRODUCTION OF: PLASTIC BOTTLES   <br /></div>

I've to extract the only title near <img> tag trimming all spaces before and after, than wrap it in a <h1> tag. The expeded result should be:

<div><h1>THE PRODUCTION OF: PLASTIC BOTTLES</h1></div>

I've done a regular expression that works but that also include the spaces in the final result:

/<img\s*src="[^"]+"\s*alt="AAA"\s*width="24"\s*height="24"\s*\/>\s*([^<]+)\s*<br\s*\/>/

The image is recognizable for these characteristics values of alt, width and height attributes. Thanks.

Federico Liva
  • 418
  • 1
  • 6
  • 19

3 Answers3

1

Actually, there's a simple enough way to do this without regex at all.

'<div><h1>' . trim(strip_tags($original_html)) . '</h1></div>';

First remove all tags, then trim the whitespace, finally wrap it in whatever tags you need.

Okonomiyaki3000
  • 3,628
  • 23
  • 23
  • [The PONY he comes](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) - and he goes right past you with a nod of approval :p – Niet the Dark Absol May 19 '14 at 09:25
1

Making your match non greedy should do the trick: <img\s*src="[^"]+"\s*alt="AAA"\s*width="24"\s*height="24"\s*\/>\s*([^<]+?)\s*<br\s*\/> (notice the extra ? next to [^<]+). More information available here.

That being said, you should really be using something like the PHP DOM Parser to process HTML.

npinti
  • 51,780
  • 5
  • 72
  • 96
1

I think a better solution is to use jQuery.Specifically the method .text()

<div id='mydiv'><img src="images/stories/2014/AAA.gif" alt="AAA" width="24" height="24" /> THE PRODUCTION OF: PLASTIC BOTTLES   <br /></div>`
 <script>var text = $('#mydiv').text();$('#mydiv').html('<h1>' + text + '</h1>');</script>

And the result is:

 <div><h1>THE PRODUCTION OF: PLASTIC BOTTLES</h1></div>
Erik Lucio
  • 948
  • 7
  • 8
  • Sure, it's easy to do this in jQuery but... maybe there's a reason he'd rather do it server side, maybe he doesn't use jQuery, and anyway, if that html hits the site, the browser will start to load that image before jQuery can restructure it. If he's got a lot of these on the page, he surely doesn't want that. – Okonomiyaki3000 May 19 '14 at 08:49