I posted this question a while back and it is working great for finding and 'linkifying' links from user generated posts. Linkify Regex Function PHP Daring Fireball Method
<?php
if (!function_exists("html")) {
function html($string){
return htmlspecialchars($string, ENT_QUOTES, 'UTF-8');
}
}
if ( false === function_exists('linkify') ):
function linkify($str) {
$pattern = '(?xi)\b((?:(http)s?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»“”‘’]))';
return preg_replace_callback("#$pattern#i", function($matches) {
$input = $matches[0];
$url = $matches[2] == 'http' ? $input : "http://$input";
return '<a href="' . $url . '" rel="nofollow" target="_blank">' . "$input</a>";
}, $str);
}
endif;
echo "<div>" . linkify(html($row_rsgetpost['userinput'])) . "</div>";
?>
I am concerned that I may be introducing a security risk by inserting user generated content into a link. I am already escaping user content coming from my database with htmlspecialchars($string, ENT_QUOTES, 'UTF-8')
before running it through the linkify function and echoing back to the page, but I've read on OWASP that link attributes need to be treated specially to mitigate XSS. I am thinking this function is ok since it places the user-generated content inside double quotes and has already been escaped with htmlspecialchars($string, ENT_QUOTES, 'UTF-8')
, but would really appreciate someone with xss expertise to confirm this. Thanks!