3

I have different format array of html [amp;src]=>image, anotherone [posthtml]=>image2, anothertwo [nbsp;image3

How to extract img and text using common preg_match() by which we can get perfect image src and text from html. If it is not possible using preg_match(), is there another way to fix it. If any one know please, reply it. How to fix it. I need your hand.

Ankan Bhadra
  • 37
  • 1
  • 1
  • 2
  • What do you mean by `[amp;src]=>image, anotherone [posthtml]=>image2, anothertwo [nbsp;image3` – Shiplu Mokaddim Feb 15 '12 at 07:09
  • possible duplicate of [How to extract img src, title and alt from html using php?](http://stackoverflow.com/questions/138313/how-to-extract-img-src-title-and-alt-from-html-using-php) – Ferdinand Beyer Feb 15 '12 at 07:10

1 Answers1

9

The recommended way is to use DOM

$dom = new DOMDocument;
$dom->loadHTML($HTML);
$images = $dom->getElementsByTagName('img');

foreach($images as $im){
    $attrs = $imgages->attributes();
    $src = $attrs->getNamedItem('src')->nodeValue
}

Using Regular expression:

preg_match_all("/<img .*?(?=src)src=\"([^\"]+)\"/si", $html, $m); 
print_r($m);
Shiplu Mokaddim
  • 56,364
  • 17
  • 141
  • 187
  • Thanks for reply. But first one isn't working. Second one isn't coming value. – Ankan Bhadra Feb 15 '12 at 10:32
  • @AnkanBhadra Regular expression is updated. DOMDocument does not recognize HTML5 now. – Shiplu Mokaddim Feb 15 '12 at 11:02
  • while I would love to support the argument that one shouldn't use regular expressions to parse HTML, I think this is a good use case especially when you don't control the website containing the target HTML. there are countless pages on the web that contain errors that prevent PHP's DOMDocument from working at all -- and rightly so, since an XML parser is supposed to fail on error by design. – Michael Butler Nov 27 '12 at 23:19
  • @MichaelButler Even there are errors, DOMDocument *tries* to parse. When it tries it parses most of the content. And usually its sufficient. – Shiplu Mokaddim Nov 28 '12 at 00:06