1

I have the following links:

<a href="http://example.com/src/abc.png"><img src="http://example.com/res/bca.png"></a>
<a href="http://example.com/src/hvc.gif"><img src="http://example.com/res/ncq.jpg"></a>

Using PHP, I would like to be able to get the links only containing src in the image URL, and these must only be images (png, gif, jpg etc.) The problem I have is that I am unaware of the contents of the file, but certain that it contains links which look like the ones above; as in, I do not know the filenames of the images.

In short, is there any way to get all of the links (only the ones containing src in the filename) using PHP, and put them in an array or a string? I already have the source of the page (containing the image links) as $html.

Any help would be appreciated.

kryger
  • 12,906
  • 8
  • 44
  • 65
q3d
  • 3,473
  • 8
  • 34
  • 39

4 Answers4

2

The following link will be useful for you

Regular expressions solution from second link(I edited it a little bit):

function linkExtractor($html){
 $linkArray = array();
 if(preg_match_all('/<img\s+.*?src=[\"\']?([^\"\' >]*)[\"\']?[^>]*>/i',$html,$matches,PREG_SET_ORDER)){
  foreach($matches as $match){
   array_push($linkArray,array($match[1],$match[2]));
  }
 }
 return $linkArray;
}
Community
  • 1
  • 1
Jomoos
  • 12,823
  • 10
  • 55
  • 92
  • This seems to work, but I was thinking maybe to only put strings beginning with `http://images.` in the array? I don't want all images. Any ideas? – q3d Dec 17 '11 at 11:38
  • @user1015599 You may change the regular expression to include that. It may look something like this: `'/]*)[\"\']?[^>]*>/i'` – Jomoos Dec 17 '11 at 13:44
0

You should try dom document.

<?php

@$dom = new DOMDocument();
@$dom->loadHTML($html); // $html is HTML content
$dom->preserveWhiteSpace = false;

$tags_img = $dom->getElementsByTagName('img');

$images = array();

foreach($tags_img as $img)
{
    $images[] = $img->getAttribute('src');  
}

echo '<pre>';
print_r($images);
exit;

?>

Additionally you can also make domain check like only images from xyz.com

Brad Larson
  • 170,088
  • 45
  • 397
  • 571
Kliptu
  • 199
  • 1
  • 4
  • 17
0

Have you tried something like this?

$regexp = "<img[^']*?src=\"([^']*?)\"[^']*?>";

if(preg_match_all("/$regexp/siU", $input, $matches)) {
  echo "<pre>";
  print_r($matches);
  echo "</pre>";
}

You should probably use something like SimpleHTMLDOM though.

ghstcode
  • 2,902
  • 1
  • 20
  • 30
0

If you don't want to make use of external libraries, you can use the build in DOM options PHP LINK: http://www.php.net/manual/en/book.dom.php

Example code

<?php

//string is the (x)html document
$links = array();
$string = '<html><body><a href="http://xyz.com/src/abc.png"><img src="http://xyz.com/res/bca.png"></a><a href="http://xyz.com/src/hvc.gif"><img src="http://xyz.com/res/ncq.jpg"></a></body></html>';

//Load/parse the (x)html document
$doc = new DOMDocument();
$doc->loadHTML($string);

//get all 'a' elements (links)
$elements = $doc->getElementsByTagName('a');

//Now check if we got results
if($elements->length >= 1)
{
   //We got results, check each result
   foreach($elements as $element)
   {
      //Check if this Link has an img child element
      $img = $element->getElementsByTagName('img');
      //You can validate if the src contains .jpg extension if you want
      //but for this example I'm skipping this
      if($img->length == 1)
      {
         //We got an link that has a img child element, store link
         $links[] = $element->getAttribute('href');
      }
   }

   //show all links
   echo '<pre>'."\r\n";
   print_r($links);
   echo '</pre>'."\r\n";

}
?>
Cecil Zorg
  • 1,478
  • 13
  • 15