0

Possible Duplicate:
PHP URL to Link with Regex

I am trying to preg_replace all links in a textblock, so that they are replaced with the string "<< IMAGE >>"

I am trying to match all links in this text block:

"Here we can see the chupacabra trying to eat http://somesite.com/img/chupacabra.jpg. 
look at the the snails somesite.com/img/snails.png"

I am using this regexp:

(https?:\/\/)?.*(jpg|png|jpeg|bmp|gif)

But I can't get it to select only the exact strings of links.
What I need is this:

"Here we can see the chupacabra trying to eat http://somesite.com/img/chupacabra.jpg. look at the the snails somesite.com/img/snails.png"

Testing on regexpal.com

Community
  • 1
  • 1
Ted
  • 3,805
  • 14
  • 56
  • 98

1 Answers1

4

There are two problems - first * ist greedy, so it matches way to much. The second is, .* includes spaces, which aren't in URLs, so you dont get the right URLs.

The simpliest solution is just to use \S which match everything except whitespaces:

(https?:\/\/)?\S*(jpg|png|jpeg|bmp|gif)

The better way is to use this one which use ?<= for lookbehind

(?<= |^)(https?://)?[\w-]+(\.[\w-]+)+\.?(:\d+)?(/\S*)\.(jpg|png|jpeg|bmp|gif)
Philipp
  • 15,377
  • 4
  • 35
  • 52
  • How can I make it not match tttteerehttp://site.com/1.jpg ? – Ted Feb 05 '13 at 09:17
  • add an `\b` at the beginning to mark this as an word boundary – Philipp Feb 05 '13 at 12:12
  • @Phillip testing on regexpal.com although it sounds very logical, it won't work there – Ted Feb 05 '13 at 12:14
  • 1
    ok i see.. this is because of \S which maches the invalid crap at he beginning.. try `\b(https?:\/\/)?[^ :]*(jpg|png|jpeg|bmp|gif)` – Philipp Feb 05 '13 at 12:19
  • it won't match dfdfdfdfhttp://www.google.com/1.jpg just the part after http:// – Ted Feb 05 '13 at 12:29
  • ok.. things become more complicated... `(?<= |^)(https?://)?[\w-]+(\.[\w-]+)+\.?(:\d+)?(/\S*)\.(jpg|png|jpeg|bmp|gif)` possible, this code doesn't work with regexpal and you have to test it in php, because of the `?<=` Syntax at the beginning – Philipp Feb 05 '13 at 12:49
  • (https?://)?[\w-]+(\.[\w-]+)+\.?(:\d+)?(\S*)\.(jpg|png|jpeg|bmp|gif) This one works with regexpal -( omitted the first part), I don't know what it means, but I can't make it work on PHP. I keep getting a warning [here](http://ideone.com/58IFU2) – Ted Feb 05 '13 at 13:33
  • Sure, because you use / to seperate the regex from the modifiers. You have to use an other char like ~: `preg_replace('~(https...andsoon...)~')` – Philipp Feb 05 '13 at 13:42
  • I think I found it: `preg_replace('~(https?://)?[\w-]+(\.[\w-]+)+\.?(:\d+)?(\S*)\.(jpg|png|jpeg|bmp|gif)~','[[image:$0]]',$text); ` can you verify? – Ted Feb 05 '13 at 14:24