1

We use the following regular expression to convert URLs in text to links, which are shortened with ellipsis in the middle if they are too long:

/**
 * Replace all links with <a> tags (shortening them if needed)
 */
$match_arr[] = '/((http|ftp)+(s)?:\/\/[^<>\s,!\)]+)/ie';
$replace_arr[] = "'<a href=\"\\0\" title=\"\\0\" target=\"_blank\">' . " .
    "( mb_strlen( '$0' ) > {$maxlength} ? mb_substr( '$0', 0, " . ( $maxlength / 2 ) . " ) . '…' . " .
    "mb_substr( '$0', -" . ( $maxlength / 2 ) . " ) : '$0' ) . " .
"'</a>'";

This is working. However, I found that if there is a link in the text already, like:

$text = '... <a href="http://www.google.com">http://www.google.com</a> ...';

it will match both URLs, so it will try to create two more <a> tags, totally messing up the DOM of course.

How can I prevent the regex from matching if the link is already inside an <a> tag? It will also be in the title attribute, so basically I just want to skip every <a> tag completely.

Rijk
  • 11,032
  • 3
  • 30
  • 45

1 Answers1

1

The simplest way (with a regex, which arguably is not the most reliable tool in this situation) would probably be to make sure that no </a> follows after your link:

#(http|ftp)+(s)?://[^<>\s,!\)]++(?![^<]*</a>)#ie

I'm using possessive quantifiers to make sure that the entire URL will be matched (i. e. no backtracking in order to satisfy the lookahead).

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561