regex how catch urls ending with .gif

Question

hi i want to make a script with regex i need to cath all gif images urls with php . here is what i did

<?php
$subject = file_get_contents("http://www.9gag.com");
$search="^https?://(?:[a-z\-]+\.)+[a-z]{2,6}(?:/[^/#?]+)+\.(?:jpg|gif|png)$";
preg_match($search, $subject, $result); 
print_r($result);
?>

my example is not working . i just searched stackoverflow.com read some examples but not enough i think thanks

i need to catch gif images with url's
i need this built with php and regex

Your regex starts with a `^` and `$` meaning the only matches will have to start and end with a URL instead of include a URL somewhere inside it. I think this is your issue — Jason Sperske, Nov 07 '13 at 22:46
You're missing the delimiters at the beginning and end of the regexp. Aren't you getting errors from this? — Barmar, Nov 07 '13 at 22:47
i am getting this error Warning: preg_match() [function.preg-match]: Unknown modifier '/' in C:\wamp\www\book\9gag.php on line 10 — synan54, Nov 07 '13 at 22:49
@Barmar I will retract and downgrade my previous comment. Sure you can use a regular expression here, *but*: A) You should use a DOM parser to extract the `href` values only from relevant tag types. B) This regular expression is a mess and will break if the hostname contains a number, is an IP, if there's a non-standard port declared, and probably other ways. — Sammitch, Nov 07 '13 at 22:56
@Sammitch, as I got my hands dirty trying to get a stable and correct answer to this question I was reminded (yet again) why DOM parsers are a much saner way to accomplish this. Does PHP have something like Python's [BeautifulSoup](http://www.crummy.com/software/BeautifulSoup/)? — Jason Sperske, Nov 07 '13 at 23:03
@Sammitch A DOM parser is appropriate if you're extracting URLs from specific places like src attributes. I assumed (probably incorrectly) that he was searching text for them. — Barmar, Nov 07 '13 at 23:06
@Barmar it's still problematic at best. There are plenty of sites/services that serve image URLs like `http://service.com/image.php?im=file.gif`. Then there's' the probelm of where the enclosing text begins/ends, etc, et al. I started writing a regex that *might* work for giggles, but once the debuggex diagram went off the edge of my 1920px wide monitor it became apparent that this was a losing prospect. — Sammitch, Nov 07 '13 at 23:10
@Sammitch That's a problem even if you do use a DOM parser, if he wants to filter out those elements. He'll need to match the contents of the `src` attribute against the regexp. — Barmar, Nov 07 '13 at 23:13

score 2 · Accepted Answer · edited May 23 '17 at 11:59

2

Your regex starts with a ^ and $ meaning the only matches will have to start and end with a URL instead of include a URL somewhere inside it. Try this (combining this URL regex with yours):

/(?:(?:(?:[A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+@)[A-Za-z0-9.-]+)(?:(?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[\w]*))?)(?:jpg|gif|png)/

or in PHP:

preg_match_all("/(?:(?:(?:[A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+@)[A-Za-z0-9.-]+)(?:(?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[\w]*))?)(?:jpg|gif|png)/", $input_lines, $output_array);

And here is an online demo for this regex using the source code to this page (before I made this edit) (look at the preg_match_all tab).

edited May 23 '17 at 11:59

Community

1
1

answered Nov 07 '13 at 22:49

Jason Sperske

29,816
8
73
124

1

i get this error ( ! ) Warning: preg_match() [function.preg-match]: Unknown modifier '/' in C:\wamp\www\book\9gag.php on line 10 – synan54 Nov 07 '13 at 22:51
1

@synan54 You can use any modifier use `%` instead – AlexP Nov 07 '13 at 22:54
1

Updated with something that should be more helpful (and a link to a working example of it) – Jason Sperske Nov 07 '13 at 23:03
1

this makes another error Warning: preg_match_all() [function.preg-match-all]: Unknown modifier '(' in C:\wamp\www\book\9gag.php on line 10 – synan54 Nov 07 '13 at 23:09
1

You're still missing the delimiters at the beginning and end. – Barmar Nov 07 '13 at 23:13
1

It should work now (added the regex delimiters and an escaped PHP example) – Jason Sperske Nov 07 '13 at 23:40

regex how catch urls ending with .gif

1 Answers1