javascript: Match and replace non-tagged URL in textarea with regex

Question

For now I used this to replace(linkify) URLs in a textarea for submission (The textarea may have tagged and non-tagged URLs mixed):

function repl(text) {
  var exp = /[^<>]\b(?:https?|ftp):\/\/[a-z0-9-+&@#\/%?=~_|!:,.;]*[a-z0-9-+&@#\/%=~_|](?![^<>])/gim;

  return text.replace(exp, '<a href="$&">$&</a>');
}

It is somewhat working, but having \n inside href="" and the text node of <a>, which is annoying.

I tried to modify the regex to NOT having \n in result but I failed to do so.

Can anyone help me to improve this? (I used it in a bookmarklet)

score 2 · Accepted Answer · edited May 23 '17 at 12:08

Your [^<>] at the beginning is a consuming pattern matching any char other than < and >, and can match more then just a newline. You put this char into the href value with the rest of the matched string.

Instead, capture the rest of the pattern:

/(^|[^<>])\b((?:https?|ftp):\/\/[a-z0-9+&@#\/%?=~_|!:,.;-]*[a-z0-9-+&@#\/%=~_|])(?![^<>])/gi
 ^^^^^^^^^  ^                                                                  ^

The (^|[^<>]) will be Group 1 and the rest will be captured into Group 2. Use $1 and $2 backreferences in the replacement pattern to put the captured parts into their appropriate places:

function repl(text) {
  var exp = /(^|[^<>])\b((?:https?|ftp):\/\/[a-z0-9+&@#\/%?=~_|!:,.;-]*[a-z0-9-+&@#\/%=~_|])(?![^<>])/gi;
  return text.replace(exp, '$1<a href="$2">$2</a>');
}

For a more comprehensive URL extraction regex, see How can i extract URL's from a piece of text into an Array using JavaScript with a Diego Perini's URL regex example usage. You may adjust it as shown here:

s.replace(/(^|[^<>])\b((?:(?:https?|ftp):\/\/)(?:\S+(?::\S*)?@)?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,}))\.?)(?::\d{2,5})?(?:[\/?#]\S*)?)(?![<>])/gi, '$1<a href="$2">$2</a>')

A much simpler and usually working alternative is to match any char other than whitespace and </> (as many as possible with * quantifier) after the protocol up to the non-word char (thanks to the \b word boundary):

s.replace(/(^|[^<>])\b((?:https?|ftp):\/\/[^<>\s]+\b)/gi, '$1<a href="$2">$2</a>')

See the regex demo here

I works better now, but it seems that it matches last URL only. How can I replace(linkify) all URLs in textarea? — Roy, Dec 20 '16 at 07:37
Use a better regex, check lots of other posts on SO: [*Regular expression to find URLs within a string*](http://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-string) might help. I just tried to address the main issue described in your question. — Wiktor Stribiżew, Dec 20 '16 at 07:39
Another approach is to use [`.replace(/(^|[^<>])\b((?:https?|ftp):\/\/[^<>\s]+\b)/gi, '$1$2')`](https://regex101.com/r/jWRHc1/1). — Wiktor Stribiżew, Dec 20 '16 at 07:51

score 0 · Answer 2 · answered Dec 20 '16 at 08:01

0

Thanks Wiktor Stribiżew for suggestion, I have a fully working version now:

function repl(text) {
  var exp = /(^|[^<>"])\b((?:https?|ftp):\/\/[^<>\s]+\b)/gi;
  return text.replace(exp, '$1<a href="$2">$2</a>');
}

answered Dec 20 '16 at 08:01

Roy

418
2
16

javascript: Match and replace non-tagged URL in textarea with regex

2 Answers2