1

I want to export the first image link from given text with PHP. Let's say I have text like this:

Lorem ipsum <img rel="lorem" src="lorem.jpg"/> dolor sit amet, consectetuer <IMG src="ipsu.jpg" rel="ipsum"/ >

I need to export lorem.jpg to the variable in PHP. So, for example, finally $variable must be equal to lorem.jpg.
I used regular expressions, stripos and so on functions, but everytime there was some problem.
If you have any idea to solve this, please help.

John
  • 877
  • 5
  • 14
  • 21

4 Answers4

3

Regular expressions can be good for a large variety of tasks, but it usually fails when parsing HTML DOM. The problem with HTML is that the structure of your document is so variable that it is hard to accurately extract a tag.

We can use DOM parser such as SimpleHTML

You can use it like :

$html = 'Lorem ipsum <img rel="lorem" src="lorem.jpg"/> dolor sit amet, consectetuer <img src="ipsu.jpg" rel="ipsum"/ > ';

$first_image_source = get_first_image($html);
echo $first_image_source;

function get_first_image($html){

  require_once('simple_html_dom.php');

  $post_dom = str_get_html($html);

  $first_img = $post_dom->find('img', 0);

  if($first_img !== null) {
      return $first_img->src;
  }

  return null;
}

You can also get the alt attribute of the image in the same way.

If you want to get source of all images then you can use:

function get_images($html){

     require_once('simple_html_dom.php')

     $post_dom = str_get_html($html);

     $img_tags = $post_dom->find('img');

     $images = array();

     foreach($img_tags as $image) {
        $images[] = $image->src;
     }

     return $images;
  }

Hope this helps :) :)

Sabari
  • 6,205
  • 1
  • 27
  • 36
1

Everyone will tell you that you really need to use html parser rather than regex (which is true) because there are many cases where regex will fail to be able to parse valid html. That being said if you absolutely are sure that the html will be in this format go for

preg_match('/src="([^"]*)"/i',$html,$matches);
$image = $matches[1];

Use preg_match_all if you need more than the first one. Good luck!

hackartist
  • 5,172
  • 4
  • 33
  • 48
  • +1 for answering the question with a regexp and still providing the correct and more professional alternative of using a DOM parser :) – jamesmortensen Jan 29 '12 at 03:27
1

There are many resources on the net that will tell you that regular expressions are not recommended for parsing DOM elements. There are several PHP DOM Libraries that can be used for the exact purpose you're looking to use it for, HTML Parsing.

The Simple HTML DOM Library is but one example of a library that can be used to extract DOM elements from a page.

jamesmortensen
  • 33,636
  • 11
  • 99
  • 120
0

Looks like that I can not post comments on other's answer. This is just an extension to hackartist's reply.

Below is a regex used to find the first source in the image link. Because src="([^"]*)" might match src in the iframe too.

<img(?:[^>]+)src="([^"]*)"
Kleenestar
  • 769
  • 4
  • 4