Your [^<>]
at the beginning is a consuming pattern matching any char other than <
and >
, and can match more then just a newline. You put this char into the href
value with the rest of the matched string.
Instead, capture the rest of the pattern:
/(^|[^<>])\b((?:https?|ftp):\/\/[a-z0-9+&@#\/%?=~_|!:,.;-]*[a-z0-9-+&@#\/%=~_|])(?![^<>])/gi
^^^^^^^^^ ^ ^
The (^|[^<>])
will be Group 1 and the rest will be captured into Group 2. Use $1
and $2
backreferences in the replacement pattern to put the captured parts into their appropriate places:
function repl(text) {
var exp = /(^|[^<>])\b((?:https?|ftp):\/\/[a-z0-9+&@#\/%?=~_|!:,.;-]*[a-z0-9-+&@#\/%=~_|])(?![^<>])/gi;
return text.replace(exp, '$1<a href="$2">$2</a>');
}
For a more comprehensive URL extraction regex, see How can i extract URL's from a piece of text into an Array using JavaScript with a Diego Perini's URL regex example usage. You may adjust it as shown here:
s.replace(/(^|[^<>])\b((?:(?:https?|ftp):\/\/)(?:\S+(?::\S*)?@)?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,}))\.?)(?::\d{2,5})?(?:[\/?#]\S*)?)(?![<>])/gi, '$1<a href="$2">$2</a>')
A much simpler and usually working alternative is to match any char other than whitespace and <
/>
(as many as possible with *
quantifier) after the protocol up to the non-word char (thanks to the \b
word boundary):
s.replace(/(^|[^<>])\b((?:https?|ftp):\/\/[^<>\s]+\b)/gi, '$1<a href="$2">$2</a>')
See the regex demo here