-2

Possible Duplicate:
Finetune Regex to skip tags

Currently my function looks like this. It converts plain text URLs into HTML links.

function UrlsToLinks($text){
    return preg_replace('@(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.-]*(\?\S+)?)?)?)@', '<a href="$1" target="_blank">$1</a>', $text);
}

But there are some problems. What I'm trying to do is skip existing links, the src attribute in <img> tags, etc.. Can't figure out what I need to modify in this function.

Community
  • 1
  • 1
heron
  • 3,611
  • 25
  • 80
  • 148
  • What is the string you are trying to parse? – gen_Eric Aug 14 '12 at 19:26
  • 5
    Your desired functionality is painfully difficult, if not impossible, to achieve using regular expressions, much less a single regular expression. You really should be using an HTML parser, looking for links only within the text content of HTML nodes. – nickb Aug 14 '12 at 19:26
  • 4
    What you need to do is use an HTML parser to extract the text nodes and only run *them* through the above function. Trying to modify it so that it will ignore bits and pieces of HTML will bring down the wrath of Tony the Pony and we will all burn in the firey depths. Either that or your application will be insecure and unreliable, one of the two. – DaveRandom Aug 14 '12 at 19:28
  • @Rocket html markup, images, urls as plain text inside – heron Aug 14 '12 at 19:28
  • @epic_syntax: see the answer here: http://stackoverflow.com/q/11958415/1596455 – DesertEagle Aug 15 '12 at 15:42

1 Answers1

1

This would work, assuming that the URLs we want to replace are not already inside a tag.

function UrlsToLinks($text){
    $matches = array();
    $strippedText = strip_tags($text);

    preg_match_all('@(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.-]*(\?\S+)?)?)?)@', $strippedText, $matches);

    foreach ($matches[0] as $match) {       
        if (filter_var($match, FILTER_VALIDATE_URL)) {
            $text = str_replace($match, '<a href="'.$match.'" target="_blank">'.$match.'</a>', $text);
        }
    }
    return $text;
}
Tchoupi
  • 14,560
  • 5
  • 37
  • 71