0

I search for images in a forum post with this PHP code:

if(preg_match("~<img.*src=\"(.*)\".*/>~isU", $htmltext, $imatch))
{
    $imageurl = $imatch[1];
}

This will find the first image in the htmltext. However, I want to skip any images that are smilie icons. All the smilie icons rest in the folder /forum/smilies/. How can I exclude this folder from the regular expression?

reggie
  • 3,523
  • 14
  • 62
  • 97

1 Answers1

3

It is not recommended to use regex when you try to parse HTML. You can take a look at this answer on this same problem.

This will do the trick:

$dom = new DOMDocument();
$dom->loadHTML($htmltext);
$images = $dom->getElementsByTagName('img');
$valid  = array();
foreach ($images as $image) {
    $src = $image->getAttribute('src');
    if ($src !== '' && strpos($src, '/forum/smilies/') !== 0) {
        $valid[] = $src;
    }
}
print_r($valid);

$valid is an array containing all non-similey img's src within the given $htmltext.

Community
  • 1
  • 1
Carlos
  • 4,949
  • 2
  • 20
  • 37
  • yes, use DOM parsing, but in your code it would probably be better to use `stripos`. Even the [php manual](http://us3.php.net/manual/en/function.preg-match.php) says: *"Do not use preg_match() if you only want to check if one string is contained in another string. Use strpos() or strstr() instead as they will be faster."* – cegfault Nov 12 '12 at 11:21