-3

hi i want to make a script with regex i need to cath all gif images urls with php . here is what i did

<?php
$subject = file_get_contents("http://www.9gag.com");
$search="^https?://(?:[a-z\-]+\.)+[a-z]{2,6}(?:/[^/#?]+)+\.(?:jpg|gif|png)$";
preg_match($search, $subject, $result); 
print_r($result);
?>

my example is not working . i just searched stackoverflow.com read some examples but not enough i think thanks

  1. i need to catch gif images with url's
  2. i need this built with php and regex
synan54
  • 658
  • 6
  • 16
  • 5
    Your regex starts with a `^` and `$` meaning the only matches will have to start and end with a URL instead of include a URL somewhere inside it. I think this is your issue – Jason Sperske Nov 07 '13 at 22:46
  • 4
    You're missing the delimiters at the beginning and end of the regexp. Aren't you getting errors from this? – Barmar Nov 07 '13 at 22:47
  • 1
    i am getting this error Warning: preg_match() [function.preg-match]: Unknown modifier '/' in C:\wamp\www\book\9gag.php on line 10 – synan54 Nov 07 '13 at 22:49
  • 1
    @Barmar I will retract and downgrade my previous comment. Sure you can use a regular expression here, *but*: A) You should use a DOM parser to extract the `href` values only from relevant tag types. B) This regular expression is a mess and will break if the hostname contains a number, is an IP, if there's a non-standard port declared, and probably other ways. – Sammitch Nov 07 '13 at 22:56
  • 1
    @Sammitch, as I got my hands dirty trying to get a stable and correct answer to this question I was reminded (yet again) why DOM parsers are a much saner way to accomplish this. Does PHP have something like Python's [BeautifulSoup](http://www.crummy.com/software/BeautifulSoup/)? – Jason Sperske Nov 07 '13 at 23:03
  • @Sammitch A DOM parser is appropriate if you're extracting URLs from specific places like src attributes. I assumed (probably incorrectly) that he was searching text for them. – Barmar Nov 07 '13 at 23:06
  • @Barmar it's still problematic at best. There are plenty of sites/services that serve image URLs like `http://service.com/image.php?im=file.gif`. Then there's' the probelm of where the enclosing text begins/ends, etc, et al. I started writing a regex that *might* work for giggles, but once the debuggex diagram went off the edge of my 1920px wide monitor it became apparent that this was a losing prospect. – Sammitch Nov 07 '13 at 23:10
  • @Sammitch That's a problem even if you do use a DOM parser, if he wants to filter out those elements. He'll need to match the contents of the `src` attribute against the regexp. – Barmar Nov 07 '13 at 23:13

1 Answers1

2

Your regex starts with a ^ and $ meaning the only matches will have to start and end with a URL instead of include a URL somewhere inside it. Try this (combining this URL regex with yours):

/(?:(?:(?:[A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+@)[A-Za-z0-9.-]+)(?:(?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[\w]*))?)(?:jpg|gif|png)/

or in PHP:

preg_match_all("/(?:(?:(?:[A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+@)[A-Za-z0-9.-]+)(?:(?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[\w]*))?)(?:jpg|gif|png)/", $input_lines, $output_array);

And here is an online demo for this regex using the source code to this page (before I made this edit) (look at the preg_match_all tab).

Community
  • 1
  • 1
Jason Sperske
  • 29,816
  • 8
  • 73
  • 124