2

I have a variable $content that contains some text and images in this form (unknown amount of images):

    text text text text <img src="path/to/image/1">text text text text
    <img src="path/to/image/2">
text text text text text text text text text text text text text text text text <img src="path/to/image/3"><img src="path/to/image/4">text text text text 
<img src="path/to/image/5">

I want to extract all images src and store them in array using php like so:

array(
[1]="path/to/image/1"
[2]="path/to/image/2"
[3]="path/to/image/3"
[4]="path/to/image/4"
[5]="path/to/image/5"
.
.
.
)

what is the best way to do something like this. I already tried the function explode but this way seemed inefficient.

user1481850
  • 238
  • 3
  • 13
  • 24
  • 1
    duplicate http://stackoverflow.com/questions/138313/how-to-extract-img-src-title-and-alt-from-html-using-php – worenga Jul 11 '12 at 19:48
  • Regexes are inefficient for things like this; take a look at the answer here: http://stackoverflow.com/questions/1196570/using-regular-expressions-to-extract-the-first-image-source-from-html-codes – q3d Jul 11 '12 at 19:49
  • it's better to use DomDocument! easy and reliable! – undone Jul 11 '12 at 19:50

3 Answers3

8
    $dom = new domDocument;
    $dom->loadHTML($html);
    $dom->preserveWhiteSpace = false;
    $imgs  = $dom->getElementsByTagName("img");
    $links = array();
    for($i = 0; $i < $imgs->length; $i++) {
       $links[] = $imgs->item($i)->getAttribute("src");
    }
Pavel Strakhov
  • 39,123
  • 5
  • 88
  • 127
undone
  • 7,857
  • 4
  • 44
  • 69
3

Here is an example using simplehtmldom:

include("simple_html_dom.php");
$content = '
text text text text <img src="path/to/image/1">text text text text
    <img src="path/to/image/2">
text text text text text text text text text text text text text text text text <img src="path/to/image/3"><img src="path/to/image/4">text text text text 
<img src="path/to/image/5"> ';

$html = str_get_html($content);
$images = $html->find("img");
$links = array();
foreach($images as $image) {
  $links[] = $image->src;
}

print_r($links);

Output:

Array
(
    [0] => path/to/image/1
    [1] => path/to/image/2
    [2] => path/to/image/3
    [3] => path/to/image/4
    [4] => path/to/image/5
)
Pavel Strakhov
  • 39,123
  • 5
  • 88
  • 127
  • Don't you think that installing simplehtmldom for such a menial task is a bit of an overkill? – Mike Jul 11 '12 at 20:31
  • No, it's just a single PHP file that doesn't require installation. And HTML parsing must be done using HTML parsers. not regexp or other such things. – Pavel Strakhov Jul 12 '12 at 04:24
  • I didn't mean "installing" per se, but downloading a 340kB compressed file (so maybe 1 MB uncompressed?) to do something that domDocument and easily do without "installing" anything at all, IMHO, is overkill – Mike Jul 12 '12 at 22:39
  • @mike actually that compress file is bunch of examples! main file is less than 50k that defined 2 classes, it's a good thing for those who don't want all complicity dom has! – undone Jul 13 '12 at 13:31
0

Using regex:

<?php

$str = '    text text text text <img src="path/to/image/1">text text text text
    <img src="path/to/image/2">
text text text text text text text text text text text text text text text text <img src="path/to/image/3"><img src="path/to/image/4">text text text text
<img src="path/to/image/5">';


preg_match_all('@<img.*src="([^"]*)"[^>/]*/?>@Ui', $str, $out);

print_r($out[1]);

?>

Output:

Array
(
    [0] => path/to/image/1
    [1] => path/to/image/2
    [2] => path/to/image/3
    [3] => path/to/image/4
    [4] => path/to/image/5
)
  • 1
    I wouldn't recommend regex for something that can be handled by DOM. Modify the HTML slightly and it doesn't work anymore. – Mike Jul 11 '12 at 20:23