5

So, I know there are a ton of related questions on SO, but none of them are quite what I'm looking for. I'm trying to implement a PHP function that will convert text URLs from a user-generated post into links. I'm using the 'improved' Regex from Daring Fireball towards the bottom of the page: http://daringfireball.net/2010/07/improved_regex_for_matching_urls The function does not return anything, and I'm not sure why.

<?php
if ( false === function_exists('linkify') ):   
  function linkify($str) {
$pattern = '(?xi)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»“”‘’]))';     
return preg_replace($pattern, "<a href=\"\\0\" rel=\"nofollow\" target=\"_blank\">\\0</a>", $str);      
}
endif;
?>

Can someone please help me get this to work? Thanks!

Jeff
  • 191
  • 3
  • 14
  • 3
    This exact question came up before, but it's indeed difficult to google. But enabled `error_reporting` would have told you *instantly*. – mario Apr 03 '12 at 22:29

2 Answers2

12

Try this:

$pattern = '(?xi)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`\!()\[\]{};:\'".,<>?«»“”‘’]))';     
return preg_replace("!$pattern!i", "<a href=\"\\0\" rel=\"nofollow\" target=\"_blank\">\\0</a>", $str); 

PHP's preg function do need delimiters. The i at the end makes it case-insensitive

Update

If you use # as the delimiter, you wan't need to escape the ! in the pattern as such use the original pattern string (the pattern does not have a #): "#$pattern#i"

Update 2

To ensure that the links are correct, do this:

$pattern = '(?xi)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»“”‘’]))';
return preg_replace_callback("#$pattern#i", function($matches) {
    $input = $matches[0];
    $url = preg_match('!^https?://!i', $input) ? $input : "http://$input";
    return '<a href="' . $url . '" rel="nofollow" target="_blank">' . "$input</a>";
}, $str); 

This will now append http:// to the urls so that browser doesn't think it is a relative link.

d_inevitable
  • 4,381
  • 2
  • 29
  • 48
  • Thanks. Will try to avoid answers without explanation in the future. – d_inevitable Apr 03 '12 at 22:33
  • Thank you @d_inevitable ! It seems to be recognizing the links correctly now. Works perfectly for links starting with 'http:'. However for something like 'www.google.com' the new tab address shows 'http//www.mysite.com/directory/www.google.com' I'm a bit of a novice-- thanks so much! – Jeff Apr 03 '12 at 22:38
  • @Jeff this is because the new link must include the `http://` prefix in the `href` attribute. I don't think you can do this with single function call. Try `preg_replace_callback` and an if-statement that will prepend `http://` when necessary. – d_inevitable Apr 03 '12 at 22:40
  • Thanks again. I'll look into that and give it a shot. – Jeff Apr 03 '12 at 22:45
  • @Jeff ive udpated my answer accordingly. – d_inevitable Apr 03 '12 at 22:48
  • Wow, that's great! Seems to work for `www.google.com` and `https://www.google.com` but `http://stackoverflow.com` ends up as `http://http//stackoverflow.com/` – Jeff Apr 03 '12 at 23:01
  • 1
    `$url = preg_match('!^http?s://!i', $input) ? $input : "http://$input";` should be changed to `$url = preg_match('!^https?://!i', $input) ? $input : "http://$input";` The question mark just had to be moved one spot over. – Jeff Apr 03 '12 at 23:36
  • this is working great. Wondering if you have any advice to make it xss safe using PHP. Thanks! http://stackoverflow.com/questions/10319284/is-this-linkify-method-at-risk-for-xss-attacks – Jeff Apr 25 '12 at 16:27
3

I was looking to just get the urls from a string using the same regex from the answer above by d_inevitable and wasn't looking to turn them into links or care about the rest of the string, I only wanted the urls with in the string so this is what I did. Hope it helps.

/**
 * Returns the urls in an array from a string.
 * This dos NOT return the string, only the urls with-in.
 */
function get_urls($str){

    $regex = '(?xi)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»“”‘’]))';
    preg_match_all("#$regex#i", $str, $matches);
    $urls = $matches[0];
    return $urls;

}
Kyle Coots
  • 2,041
  • 1
  • 18
  • 24