2

Oddly enough I haven't found anywhere that has answer this question specificly, all the other stack overflow things I've found aren't exactly right.

I have a body text I need to search through for image urls, this doesn't mean anything complex but basically things like:

http://www.google.com/logo.png

http://reddit.com/idfaiodf/test.jpg

NOT

http://reddit.com/sadfasdf/test.jpgMORECONTENTHERE

All the regex I've used will include the "MORECONTENTHERE" in the results. It's frustrating as hell. I just want the URL with nothing appended after or added on before!

Also I don't want anything that does HTML image link extracting - I'm not pulling these from HTML.

Any regex to do this?

EDIT:

So here is what I'm using as a source: http://pastebin.com/dE2s1nHz

It's HTML but I didn't want to mention that because I didn't want people to do

  • If you're not pulling these from HTML please post an example of where you are getting them from. Without that it's going to be very difficult to avoid either trapping your third example, or not trapping your first two. –  Aug 07 '13 at 03:14
  • Ok, adding an example now – Matthew 'mandatory' Bryant Aug 07 '13 at 03:50
  • possible duplicate of [PHP: Regular Expression to get a URL from a string](http://stackoverflow.com/questions/2720805/php-regular-expression-to-get-a-url-from-a-string) – Andy Lester Aug 07 '13 at 04:15

4 Answers4

8
https?://[^/\s]+/\S+\.(jpg|png|gif)
  1. https? is "http" or "https"
  2. :// is literal
  3. [^/\s]+ is anything but a "/" or space
  4. / is literal
  5. \S+ is anything but a space
  6. \. is "."
  7. (jpg|png|gif) is image extensions, delimited by |

Result:

enter image description here

The above is taken from RegexBuddy, used in Wine on Mac. "PCRE" is equivalent to preg_* functions. Expression should work in most regular expression flavors.

Luke
  • 13,678
  • 7
  • 45
  • 79
  • 1
    You haven't escaped the literal /'s – pjcard Apr 24 '15 at 14:04
  • You do not need to escape `/` unless you use it as a delimiter in PHP's `preg_*` functions. See http://php.net/manual/en/regexp.reference.delimiters.php. The delimiters are not part of the expression, so they are omitted. It is quite common to see `/` as a delimiter, but if you use `/` in the pattern it is often best to avoid using it as a delimiter vs escaping it. – Luke Jun 07 '16 at 22:13
5

This matches a string ending with a known image extension.

<?php

    $string = "Oddly enough I haven't found anywhere that has answer this question specificly, all the other stack overflow things I've found aren't exactly right.

    I have a body text I need to search through for image urls, this doesn't mean anything complex but basically things like:

        http://www.google.com/logo.png

        http://reddit.com/idfaiodf/test.jpg

    NOT

        http://reddit.com/sadfasdf/test.jpgMORECONTENTHERE
    ";

    $pattern = '~(http.*\.)(jpe?g|png|[tg]iff?|svg)~i';

    $m = preg_match_all($pattern,$string,$matches);

    print_r($matches[0]);

?>

Output

Array
(
    [0] => http://www.google.com/logo.png
    [1] => http://reddit.com/idfaiodf/test.jpg
    [2] => http://reddit.com/sadfasdf/test.jpg
)
AbsoluteƵERØ
  • 7,816
  • 2
  • 24
  • 35
  • The problem with this is that it will match any URL before the image up to and including the image URL. Try putting a link before an image and the match will extend to encapsulate both – Alex Jan 28 '15 at 15:37
3

Try following code:

$text = <<< EOD
http://www.google.com/logo.png
http://reddit.com/sadfasdf/test.jpgMORECONTENTHERE
http://reddit.com/idfaiodf/test.jpg
EOD;

preg_match_all('/\bhttps?:\/\/\S+(?:png|jpg)\b/', $text, $matches);
var_dump($matches[0]);
falsetru
  • 357,413
  • 63
  • 732
  • 636
0
https?://[a-zA-Z0-9.]/[a-zA-Z0-9-&.]+\.(jpg|png|gif|tif|exf|svg|wfm)

I picked some arbitrary image types, and possibly missed a couple special characters allowed in URLs. Feel free to customize for your needs.

zebediah49
  • 7,467
  • 1
  • 33
  • 50