Regular expressions can be good for a large variety of tasks, but it usually fails when parsing HTML DOM. The problem with HTML is that the structure of your document is so variable that it is hard to accurately extract a tag.
We can use DOM parser such as SimpleHTML
You can use it like :
$html = 'Lorem ipsum <img rel="lorem" src="lorem.jpg"/> dolor sit amet, consectetuer <img src="ipsu.jpg" rel="ipsum"/ > ';
$first_image_source = get_first_image($html);
echo $first_image_source;
function get_first_image($html){
require_once('simple_html_dom.php');
$post_dom = str_get_html($html);
$first_img = $post_dom->find('img', 0);
if($first_img !== null) {
return $first_img->src;
}
return null;
}
You can also get the alt attribute of the image in the same way.
If you want to get source of all images then you can use:
function get_images($html){
require_once('simple_html_dom.php')
$post_dom = str_get_html($html);
$img_tags = $post_dom->find('img');
$images = array();
foreach($img_tags as $image) {
$images[] = $image->src;
}
return $images;
}
Hope this helps :) :)