0

I wrote regular expresion that allow to select images and separate it by parts. Here is it example. Generally it works without any problem, but i found that regular expression work incorrect if multiple images go after each other. How can i fix this problem?

Regex

<img(.*?)src=(?:'|")((?:.*?)\.(?:gif))(?:'|")(.*?)\/?>
Den Kison
  • 1,074
  • 2
  • 13
  • 28

1 Answers1

2
$dom = new DOMDocument();
$dom->loadHTML($input_html_here);
$images = $dom->getElementsByTagName('img');
foreach($images as $image) {
    echo $image->getAttribute("src");
    // do stuff here
    echo "<br />";
}

Regexes are 100% the WRONG tool to use for this job.

Niet the Dark Absol
  • 320,036
  • 81
  • 464
  • 592
  • I'm not familiar with this method. May you please tell me what's `$input_html_here` which you have passed it as a argument to `loadHTML()` function? – Shafizadeh May 09 '16 at 13:26
  • Thanks, i will try to use this, but as i can remember i can not use DOMDocument cause my html can`t be parsed with it, my html code is not fully valid/ – Den Kison May 09 '16 at 13:29
  • @Shafizadeh Well presumably there is some HTML which is currently being passed to `preg_match`. Just pass it to `loadHTML` here. – Niet the Dark Absol May 09 '16 at 13:29
  • @Kison Use `libxml_use_internal_errors(true);` to hide any parsing errors. It's HTML, so it will give *something*, and in this case ignoring errors is reasonable since you're only interested in the `img` tags, which don't have contents. – Niet the Dark Absol May 09 '16 at 13:30