1

I have a string extracted from the DB in this way:

<p><img style="margin: 5px; float: left;" alt="rotary-wheelchairs" src="images/stories/DSC_0693_400x400.jpg" />In a 2 week period, the Rotary Club of Playa, in partnership with the... 145 wheelchairs to disabled children and adults. </p>

I'd like to extract these three values from that string:

1- img: all the img tag or at least the value of the src

2- The alt value

3- The plain text, example "In a 2 week period, the..."

Any idea how can I achieve this?

Saymon
  • 510
  • 9
  • 20

3 Answers3

3

If the strings are saved in that format, you can use regex and preg_match.

RegEx101.com Test Case

/(img).*?alt="(.*?)".*?src="(.*?)"/

enter image description here

<?php
    $reg = '/(img).*?alt="(.*?)".*?src="(.*?)"/';
    $str = '<p><img style="margin: 5px; float: left;" alt="rotary-wheelchairs" src="images/stories/DSC_0693_400x400.jpg" />In a 2 week period, the Rotary Club of Playa, in partnership with the... 145 wheelchairs to disabled children and adults. </p>';
    $matches = [];
    preg_match($reg, $str, $matches);
    $img = $matches[1];
    $alt = $matches[2];
    $src = $matches[3];
    print $img . ' ' . $alt . ' ' . $src;
?>
2

You can try using some html parser for this. I have used domDocument :

$html = "Your html string"
$dom = new domDocument; 
$dom->loadHTML($html);
$img = $dom->getElementsByTagName('img')
//getting the src of image
echo $img->attributes->getNamedItem('src')->value . PHP_EOL;
//getting the alt value
echo $img->attributes->getNamedItem('alt')->value . PHP_EOL;
//plain text
echo $dom->textContent
thepiyush13
  • 1,321
  • 1
  • 8
  • 9
  • For some reason this brokes my code, I've just made a copy paste and the code stop loading at this section when this code is used – Saymon Dec 01 '15 at 19:15
  • The problem seems to be here: echo $img->attributes->getNamedItem('src')->value . PHP_EOL; //getting the alt value echo $img->attributes->getNamedItem('alt')->value . PHP_EOL; – Saymon Dec 01 '15 at 19:45
  • can you please provide the error you are getting? meanwhile try removing PHP_EOL from the end of the statements, for example : echo $img->attributes->getNamedItem('src')->value – thepiyush13 Dec 01 '15 at 20:46
1

With PHP and regexp, I would do it in multiple steps.

First get the img and the plain text:

preg_match('/(<img.*?>)(.*)</i', $line, $m);
list($x, $img, $plain_text) = $m;
// Bug: This assumes the plain text does not include any tags (eg, <B>).

This avoids worrying about the order of the attributes and other things that might let it go past the >.

Then get each attribute separately (since they are unordered and optional):

preg_match('/ src=(".*?"|\'.*?\'|.*?)[ >]/i', $img, $m);
$src = $m[1];
// Bug:  If the whitespace is a new-line, this won't work correctly.
// Bug:  It fails to remove the outer quotes, if any.

and ditto for each other desired attribute.

(See how much things like domDocument do for you!)

Rick James
  • 135,179
  • 13
  • 127
  • 222