0

I have no problem grabbing images from a page with the code below but how do I modify it to grab both images AND images wrapped in an anchor?

        $output = preg_match_all('/<img.+src=[\'"]([^\'"]+)[\'"].*>/i', $post->post_content, $matches);
Andy Lester
  • 91,102
  • 13
  • 100
  • 152
ok1ha
  • 637
  • 1
  • 11
  • 30
  • 3
    [Don't use regex to parse html code, instead use a dom parser!](http://stackoverflow.com/a/1732454/1519058)... – Enissay Apr 03 '14 at 16:26
  • 2
    Can you give some example input you'd like to match. Also, your regex will match from the first ` –  Apr 03 '14 at 16:57
  • Do you need to do that on server side? I mean, if you are grabbing this page from a site already published, you are a "Client", perhaps use jQuery instead? – celerno Apr 03 '14 at 18:36

1 Answers1

0

You can use something like this to grab either the whole image tag or just the image name out of the string:

$string = '<img src="http://www.google.com/trans.gif">

<a href="http://www.google.com"><img src="http://www.yahoo.com/images/placeholder.gif"></a>';

if (preg_match_all('/<img.*?src=[\'"](.*?)[\'"].*?>/i', $string, $matches)) {
    print "<pre>"; print_r($matches); print "</pre>";
}
else {
    print "Could not find any matches";
}

This outputs the following:

Array
(
    [0] => Array
        (
            [0] => <img src="http://www.google.com/trans.gif">
            [1] => <img src="http://www.yahoo.com/images/placeholder.gif">
        )

    [1] => Array
        (
            [0] => http://www.google.com/trans.gif
            [1] => http://www.yahoo.com/images/placeholder.gif
        )

)

Explanation of the REGEX:

<img   .*?   src=   [\'"]   (.*?)   [\'"]   .*?   >
  ^     ^      ^      ^       ^       ^      ^    ^
  1     2      3      4       5       6      7    8
  1. <img Look for a literal opening image tag.
  2. .*? Match any character ., any number of times * until it hits the next part of the expression ?. In this case the next part of the expression is src=, so it will stop looking for stuff once it hits that.
  3. src= Look for the exact text of src=.
  4. [\'"] A character class meaning to match either a single or double quote.
  5. (.*?) This is the same as number 2, except we put it in parenthesis so that we can capture whatever it finds.
  6. [\'"] Same as number 4.
  7. .*? Same as number 2.
  8. > Look for a literal greater than sign (closing HTML bracket).

Here is a working demo

Quixrick
  • 3,190
  • 1
  • 14
  • 17