1
$bits = preg_split('#((?:https?|ftp)://[^\s\'"<>()]+)#S', $token->data, -1, PREG_SPLIT_DELIM_CAPTURE);

Say,I'm trying to match urls that need to be linkified.The above is too permissive.

I want to only match simple urls like http://google.com, but not <a href="http://google.com">http://google.com</a>, or <iframe src="http://google.com"></iframe>

wamp
  • 5,789
  • 17
  • 52
  • 82
  • Can you explain what you are trying to do, eventually? – Tomalak Sep 25 '10 at 09:36
  • I'm trying to linkify urls,wrap it with `` **only when necessary**. – wamp Sep 25 '10 at 09:46
  • I thought so. This question has been asked here a couple of dozen times. Please [consider searching](http://stackoverflow.com/search?q=), one of the answers might just do what you want. – Tomalak Sep 25 '10 at 10:59

4 Answers4

2

It appears that you're trying to parse HTML using regular expressions. You might want to rethink that.

Community
  • 1
  • 1
Nick Bastin
  • 30,415
  • 7
  • 59
  • 78
  • how is matching a url in a string parsing html? – grapefrukt Sep 25 '10 at 08:46
  • 3
    You're matching the URL within an HTML context. Load the HTML into a DOMDocument and then test each text node against your pattern. – Justin Johnson Sep 25 '10 at 08:50
  • @wamp: If you're specifically trying to avoid a greedy algorithm that eats HTML tags, that must mean you're in a position (at least sometimes) where your link will be embedded in HTML. And that way lies madness. – Nick Bastin Sep 25 '10 at 18:59
0

try this...

function validUrl($url){
        $return=FALSE;
        $matches=FALSE;
        $regex='#(^';                  #match[1]
        $regex.='((https?|ftps?)+://)?'; #Scheme match[2]
        $regex.='(([0-9a-z-]+\.)+'; #Domain match[5] complete match[4]
        $regex.='([a-z]{2,3}|aero|coop|jobs|mobi|museum|name|travel))'; #TLD match[6]
        $regex.='(:[0-9]{1,5})?'; #Port match[7]
        $regex.='(\/[^ ]*)?'; #Query match[8]
        $regex.='$)#i';
        if( preg_match($regex,$url,$matches) ){
            $return=$matches[0]; $domain=$matches[4];
            if(!gethostbyname($domain)){ 
                $return = FALSE;
            }
        }
        if($return==FALSE){
            return FALSE;
        }
        else{
            return $matches;
        }
    }
jatt
  • 398
  • 1
  • 5
0

RE

http:\/\/[a-zA-Z0-9\.\-]*

Result

Array
(
    [0] => http://google.com
)
Amit Kumar Gupta
  • 7,193
  • 12
  • 64
  • 90
0

More effective RE

[hf]t{1,2}p:\/\/[a-zA-Z0-9\.\-]*

Result

Array
(
    [0] => Array
        (
            [0] => ftp://article-stack.com
            [1] => http://google.com
        )
)
Amit Kumar Gupta
  • 7,193
  • 12
  • 64
  • 90